arXiv:1504.00233v3 [quant-ph] 18 Oct 2015 


Marco Tomamichel 


Quantum Information Processing with 
Finite Resources 


Mathematical Foundations 


October 18, 2015 


Springer 




Acknowledgements 


I was introduced to quantum information theory during my PhD studies in Renato Renner’s group 
at ETH Zurich. It is from him that I learned most of what I know about quantum cryptography 
and smooth entropies. Renato also got me interested more generally in finite resource information 
theory as well as the entropies and other information measures that come along with it. He shaped 
my views on the topic and gave me all the tools I needed to follow my own ideas, and I am 
immensely grateful for that. A particular question that grew out of my PhD studies was whether 
one can understand smooth entropies in a broader framework of quantum Renyi entropies. This 
book now accomplishes this. 

Joining Stephanie Wehner’s group at the Centre for Quantum Technologies in Singapore for 
my postdoctoral studies was probably the best thing I could have done. I profited tremendously 
from interactions with Stephanie and her group about many of the topics covered in this book. 
On top of that, I was also given the freedom to follow my own interests and collaborate with 
great researchers in Singapore. In particular I want to thank Masahito Hayashi for sharing a small 
part of his profound knowledge of quantum statistics and quantum information theory with me. 
Vincent Y. F. Tan taught me much of what I know about classical information theory, and I greatly 
enjoyed talking to him about finite blocklength effects. 

Renato Renner, Mark M. Wilde, and Andreas Winter encouraged me to write this book. It is my 
pleasure to thank Christopher T. Chubb and Mark M. Wilde for carefully reading the manuscript 
and spotting many typos. I want to thank Rupert L. Frank, Elliott H. Lieb, Milan Mosonyi, and 
Renato Renner for many insightful comments and suggestions on an earlier draft. While writing 
I also greatly enjoyed and profited from scientific discussions with Mario Berta, Frederic Dupuis, 
Anthony Leverrier, and Volkher B. Scholz about different aspects of this book. I also want to 
acknowledge support from a University of Sydney Postdoctoral Fellowship as well as the Centre 
of Excellence for Engineered Quantum Systems in Australia. 

Last but not least, I want to thank my wife Thanh Nguyet for supporting me during this time, 
even when the writing continued long after the office hours ended. I dedicate this book to the 
memory of my grandfathers, Franz Wagenbach and Giuseppe B. Tomamichel, the engineers in 
the family. 




Contents 


1 Introduction . 1 

1.1 Finite Resource Information Theory. 1 

1.2 Motivating Example. 4 

1.3 Outline of the Book. 6 

2 Modeling Quantum Information. 11 

2.1 General Remarks on Notation. 11 

2.2 Linear Operators and Events. 13 

2.2.1 Hilbert Spaces and Linear Operators. 13 

2.2.2 Events and Measures. 16 

2.3 Eunctionals and States . 17 

2.3.1 Trace and Trace-Class Operators . 18 

2.3.2 States and Density Operators . 19 

2.4 Multi-Partite Systems. 20 

2.4.1 Tensor Product Spaces. 20 

2.4.2 Separable States and Entanglement. 22 

2.4.3 Purification. 23 

2.4.4 Classical-Quantum Systems. 23 

2.5 Eunctions on Positive Operators . 24 

2.6 Quantum Channels. 25 

2.6.1 Completely Bounded Maps. 26 

2.6.2 Quantum Channels. 26 

2.6.3 Pinching and Dephasing Channels. 28 

2.6.4 Channel Representations. 29 

2.7 Background and Further Reading. 30 

3 Norms and Metrics. 31 

3.1 Norms for Operators and Quantum States . 31 

3.1.1 Schatten Norms . 32 

3.1.2 Dual Norm For States . 34 

3.2 Trace Distance. 35 

3.3 Fidelity. 36 

3.3.1 Generalized Fidelity. 38 

vii 

































viii Contents 

3.4 Purified Distance. 40 

3.5 Background and Further Reading. 41 

4 Quantum Renyi Divergence. 43 

4.1 Classical Renyi Divergence. 43 

4.1.1 An Axiomatic Approach. 44 

4.1.2 Positive Definiteness and Data-Processing. 45 

4.1.3 Monotonicity in a and Limits. 47 

4.2 Classifying Quantum Renyi Divergences. 48 

4.2.1 Joint Concavity and Data-Processing. 49 

4.2.2 Minimal Quantum Renyi Divergence. 50 

4.2.3 Maximal Quantum Renyi Divergence. 51 

4.2.4 Quantum Max-Divergence . 51 

4.3 Minimal Quantum Renyi Divergence. 53 

4.3.1 Pinching Inequalities. 54 

4.3.2 Limits and Special Cases. 56 

4.3.3 Data-Processing Inequality. 57 

4.4 Petz Quantum Renyi Divergence. 61 

4.4.1 Data-Processing Inequality. 61 

4.4.2 Nussbaum-Szkola Distributions. 62 

4.5 Background and Further Reading. 64 

5 Conditional Renyi Entropy. 67 

5.1 Conditional Entropy from Divergence . 67 

5.2 Definitions and Properties . 69 

5.2.1 Alternative Expression for//„ . 70 

5.2.2 Conditioning on Classical Information . 71 

5.2.3 Data-Processing Inequalities and Concavity. 72 

5.3 Duality Relations and their Applications . 73 

5.3.1 Duality Relation for//j;. 74 

5.3.2 Duality Relation for . 74 

5.3.3 Duality Relation forand//„ . 75 

5.3.4 Additivity for Tensor Product States . 76 

5.3.5 Lower and Upper Bounds on Quantum Renyi Entropy. 77 

5.4 Chain Rules. 79 

5.5 Background and Further Reading. 81 

6 Smooth Entropy Calculus. 83 

6.1 Min-and Max-Entropy. 83 

6.1.1 Semi-Definite Programs. 83 

6.1.2 The Min-Entropy. 84 

6.1.3 The Max-Entropy. 86 

6.1.4 Classical Information and Guessing Probability. 88 

6.2 Smooth Entropies . 89 

6.2.1 Definition of the Smoothing Ball. 89 

6.2.2 Definition of Smooth Entropies . 90 

6.2.3 Remarks on Smoothing. 91 














































Contents ix 

6.3 Properties of the Smooth Entropies. 93 

6.3.1 Duality Relation and Beyond. 93 

6.3.2 Chain Rules. 94 

6.3.3 Data-Processing Inequalities. 95 

6.4 Fully Quantum Asymptotic Equipartition Property. 96 

6.4.1 Lower Bounds on the Smooth Min-Entropy. 97 

6.4.2 The Asymptotic Equipartition Property. 100 

6.5 Background and Further Reading. 102 

7 Selected Applications . 105 

7.1 Binary Quantum Hypothesis Testing. 105 

7.1.1 Chernoff Bound. 106 

7.1.2 Stein’s Lemma. 106 

7.1.3 Hoeffding Bound and Strong Converse Exponent . 107 

7.2 Entropic Uncertainty Relations. 108 

7.3 Randomness Extraction . 110 

7.3.1 Uniform and Independent Randomness. 110 

7.3.2 Direct Bound: Leftover Hash Lemma. Ill 

7.3.3 Converse Bound. 113 

7.4 Background and Further Reading. 114 

A Some Fundamental Results in Matrix Analysis. 117 

References. 121 
























Chapter 1 

Introduction 


As we further miniaturize information processing devices, the impact of quantum effects will 
become more and more relevant. Information processing at the microscopic scale poses chal¬ 
lenges but also offers various opportunities; How much information can be transmitted through a 
physical communication channel if we can encode and decode our information using a quantum 
computer? How can we take advantage of entanglement, a form of correlation stronger than what 
is allowed by classical physics? What are the implications of Heisenberg’s uncertainty princi¬ 
ple of quantum mechanics for cryptographic security? These are only a few amongst the many 
questions studied in the emergent held of quantum information theory. 

One of the predominant challenges when engineering future quantum information processors 
is that large quantum systems are notoriously hard to maintain in a coherent state and difficult to 
control accurately. Hence, it is prudent to expect that there will be severe limitations on the size 
of quantum devices for the foreseeable future. It is therefore of immediate practical relevance to 
investigate quantum information processing with limited physical resources, for example, to ask: 


How well can we perform information processing tasks if we only have access to a small 
quantum device? Can we beat fundamental limits imposed on information processing with 
non-quantum resources? 


This book will introduce the reader to the mathematical framework required to answer such 
questions, and many others. In quantum cryptography we want to show that a key of finite length 
is secret from an adversary, in quantum metrology we want to infer properties of a small quantum 
system from a finite sample, and in quantum thermodynamics we explore the thermodynamic 
properties of small quantum systems. What all these applications have in common is that they 
concern properties of small quantum devices and require precise statements that remain valid 
outside asymptopia — the idealized asymptotic regime where the system size is unbounded. 


1.1 Finite Resource Information Theory 

Through the lens of a physicist it is natural to see Shannon’s information theory [144] as a re¬ 
source theory. Data sources and communication channels are traditional examples of resources in 
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information theory, and its goal is to investigate how these resources are inteiTelated and how they 
can be transformed into each other. For example, we aim to compress a data source that contains 
redundancy into one that does not, or to transform a noisy channel into a noiseless one. Informa¬ 
tion theory quantifies how well this can be done and in particular provides us with fundamental 
limits on the best possible performance of any transformation. 

Shannon’s initial work [144] already gives definite answers to the above example questions 
in the asymptotic regime where resources are unbounded. This means that we can use the input 
resource as many times as we wish and are interested in the rate (the fraction of output to in¬ 
put resource) at which transformations can occur. The resulting statements can be seen as a first 
approximation to a more realistic setting where resources are necessarily finite, and this approxi¬ 
mation is indeed often sufficient for practical purposes. 

However, as argued above, specifically when quantum resources are involved we would like 
to establish more precise statements that remain valid even when the available resources are very 
limited. This is the goal of finite resource information theory. The added difficulty in the finite 
setting is that we are often not able to produce the output resource perfectly. The best we can 
hope for is to find a tradeoff between the transformation rate and the eiTor we allow on the output 
resource. In the most fundamental one-shot setting we only consider a single use of the input 
resource and are interested in the tradeoff between the amount of output resource we can produce 
and the incurred error. We can then see the finite resource setting as a special case of the one-shot 
setting where the input resource has additional structure, for example a source that produces a 
sequence of independent and identically distributed (iid) symbols or a channel that is memoryless 
or ergodic. 

Notably such considerations were part of the development of information theory from the out¬ 
set. They motivated the study of error exponents, for example by Gallager [63]. Roughly speak¬ 
ing, error exponents approximate how fast the eiTor vanishes for a fixed transformation rate as the 
number of available resources increases. However, these statements are fundamentally asymptotic 
in nature and make strong assumptions on the structure of the resources. Beyond that, Han and 
Verdii established the information spectrum method [69,70] which allows to consider unstruc¬ 
tured resources but is asymptotic in nature. More recently finite resource information theory has 
attracted considerable renewed attention, for example due to the works of Hayashi [77,78] and 
Polyanskiy et al. [133]. The approach in these works — based on Strassen’s techniques [148] — 
is motivated operationally: in many applications we can admit a small, fixed error and our goal 
is to find the maximal possible transformation rate as a function of the error and the amount of 
available resource.^ 

In an independent development, approximate or asymptotic statements were also found to 
be insufficient in the context of cryptography. In particular the advent of quantum cryptogra¬ 
phy [18,51] motivated a precise information-theoretic treatment of the security of secret keys of 
finite length [99,139]. In the context of quantum cryptography many of the standard assumptions 
in information theory are no longer valid if one wants to avoid any assumptions on the eavesdrop¬ 
per’s actions. In particular, the common assumption that resources are iid or ergodic is hardly 
justified. In quantum cryptography we are instead specifically interested in the one-shot setting, 
where we want to understand how much (almost) secret key can be extracted from a single use of 
an unstructured resource. 


* The topic has also been reviewed recently by Tan [151]. 
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The abstract view of finite resource information theory as a resource theory also reveals why it 
has found various applications in physical resource theories, most prominently in thermodynam¬ 
ics (see, e.g., [30,47,52] and references therein). 


Renyi and Smooth Entropies 

The main focus of this book will be on various measures of entropy and information that un- 
derly finite resource information theory, in particular Renyi and smooth entropies. The concept 
of entropy has its origins in physics, in particular in the works of Boltzmann [28] and Gibbs [66] 
on thermodynamics. Von Neumann [170] generalized these concepts to quantum systems. Later 
Shannon [144] — well aware of the origins of entropy in physics — interpreted entropy as a mea¬ 
sure of uncertainty of the outcome of a random experiment. He found that entropy, or Shannon 
entropy as it is called now in the context of information theory^, characterizes the optimal asymp¬ 
totic rate at which information can be compressed. However, we will soon see that it is necessary 
to consider alternative information measures if we want to move away from asymptotic state¬ 
ments. 

Error exponents can often be expressed in terms of Rmyi entropies [142] or related information 
measures, which partly explains the central importance of this one-parameter family of entropies 
in information theory. Renyi entropies share many mathematical properties with the Shannon 
entropy and are powerful tools in many information-theoretic arguments. A significant part of 
this book is thus devoted to exploring quantum generalizations of Renyi entropies, for example 
the ones proposed by Petz [132] and a more recent specimen [122,175] that has already found 
many applications. 

The particular problems encountered in cryptography led to the development of smooth en¬ 
tropies [141] and their quantum generalizations [139,140]. Most importantly, the smooth min- 
entropy captures the amount of uniform randomness that can be extracted from an unstructured 
source if we allow for a small error. (This example is discussed in detail in Section 7.3.) The 
smooth entropies are variants of Renyi entropies and inherit many of their properties. They have 
since found various applications ranging from information theory to quantum thermodynamics 
and will be the topic of the second part of this book. 

We will further motivate the study of these information measures with a simple example in the 
next section. 

Besides their operational significance, there are other reasons why the study of information 
measures is particularly relevant in quantum information theory. Many standard arguments in 
information theory can be formulated in term of entropies, and often this formulation is most 
amenable to a generalization to the quantum setting. For example, conditional entropies provide 
us with a measure of the uncertainty inherent in a quantum state from the perspective of an 
observer with access to side information. This allows us to circumvent the problem that we do not 
have a suitable notion of conditional probabilities in quantum mechanics. As another example, 
arguments based on typicality and the asymptotic equipartition property can be phrased in terms 
of smooth entropies which often leads to a more concise and intuitive exposition. Finally, the 
study of quantum generalizations of information measures sometimes also gives new insights 
into the classical quantities. For example, our definitions and discussions of conditional Renyi 

^ Notwithstanding the historical development, we follow the established tradition and use Shannon entropy to refer 
to entropy. We use von Neumann entropy to refer to its quantum generalization. 
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entropy also apply to the classical special case where such dehnitions have not yet been firmly 
established. 


1.2 Motivating Example: Source Compression 

We are using notation that will be formally introduced in Chapter 2 and concepts that will be ex¬ 
panded on in later chapters (cf. Table 1.1). A data source is described probabilistically as follows. 
Let A be a random variable with distribution pxi^) = Pr[A = x] that models the distribution of 
the different symbols that the source emits. The number of bits of memory needed to store one 
symbol produced by this source so that it can be recovered with certainty is given by [//o(A)p], 
where Ho(X)p denotes the Hartley entropy [72] of X, dehned as 

77o(^)p=log2|{.s::pA'W>0}|. (1.1) 

The Hartley entropy is a limiting case of a Renyi entropy [142] and simply measures the cardi¬ 
nality of the support of X. In essence, this means that we can ignore symbols that never occur 
but otherwise our knowledge of the distribution of the different symbols does not give us any 
advantage. 



Concept 

to be discussed further in 

fia 

Renyi entropy 

Chapters 4 and 5 

4(-,-) 

variational distance 

Section 3.1, as generalized trace distance 

zjE 

“max 

smooth Renyi entropy 

Chapter 6, as smooth max-entropy* 


entropic AEP 

Section 6.4, entropic asymptotic equipartition property 


*We will use a different metric for the definition of the smooth max-entropy. 

Table 1.1 Reference to detailed discussion of the quantities and concepts mentioned in this section. 


As an example, consider a source that outputs lowercase characters of the English alphabet. 
If we want to store a single character produced by this source such that it can be recovered with 
certainty, we clearly need [log 2 26] = 5 bits of memory as a resource. 


Analysis with Renyi Entropies 

More interestingly, we may ask how much memory we need to store the output of the source if 
we allow for a small probability of failure, £ S (0,1). To answer this we investigate encoders that 
assign codewords of a hxed length log 2 «r (in bits) to the symbols the source produces. These 
codewords are then stored and a decoder is later used to compute an estimate of X from the 
codewords. If the probability that this estimate equals the original symbol produced by the source 
is at least 1 — £, then we call such a scheme an {e,m)-code. For a source X with probability 
distribution px, we are thus interested in hnding the tradeoff between code length, log 2 m, and the 
probability of failure, £, for all (£,m)-codes. 

Shannon in his seminal work [144] showed that simply disregarding the most unlikely source 
events (on average) leads to an arbitrarily small failure probability if the code length is chosen 
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sufficiently long. In particular, Gallager’s proof [63,64] implies that (e,m)-codes always exist as 
long as 


(X 1 

log2 m>Ha{X)p + ^ log2 - 


for some a G 2 ’ ^ ^ 


Here, Ha{X)p is the Renyi entropy of order a, defined as 


Ha{X)p = 


( 1 . 2 ) 


(1.3) 


for all a G (0,1) U (1,°°) and as the respective limit for a G {0,1,°°}. The Renyi entropies are 
monotonically decreasing in a. Clearly the lower bound in (1.2) thus constitutes a tradeoff; larger 
values of the order parameter a lead to a smaller Renyi entropy but will increase the penalty term 
log 2 3. Statements about the existence of codes as in (1.2) are called achievability bounds or 
direct bounds. 

This analysis can be driven further if we consider sources with structure. In particular, consider 
a sequence of sources that produce n G N independent and identically distributed (iid) symbols 
X" = (Ti,T 2 , ... ,T„), where each T, is distributed according to the law Ty(y). We then consider a 
sequence of (e,2"^)-codes for these sources, where the rate R indicates the number of memory 
bits required per symbol the source produces. For this case (1.2) reads 

R > -Ha{X")p H-—-- log 2 — = Ha{y)x H- p: -r log 2 - (1.4) 

n ^ n(l-a) e n(l-a) e 

where we used additivity of the Renyi entropy to establish the equality. The above inequality 
implies that such a sequence of (e,2"^)-codes exists for sufficiently large n if R > Ha{X)p. And 
finally, since this holds for all a G [j, 1), we may take the limit a —> 1 in (1.4) to recover Shan¬ 
non’s original result [144], which states that such codes exists if 

R>H{X)p, where H{X)p = Hi{X)p =-'^pxix)\og 2 Pxix) (1.5) 


is the Shannon entropy of the source. This rate is in fact optimal, meaning that every scheme 
with R < H{X)p necessary fails with certainty as n —oo. This is an example of an asymptotic 
statement (with infinite resources) and such statements can often be expressed in terms of the 
Shannon entropy or related information measures. 


Analysis with Smooth Entropies 

Another fruitful approach to analyze this problem brings us back to the unstructured, one-shot 
case. We note that the above analysis can be refined without assuming any structure by “smooth¬ 
ing” the entropy. Namely, we construct an (e,m) code for the source px using the following 
recipe; 

• Fix 5 G (0,e) and let px be any probability distribution that is (e — 5)-close to px in varia¬ 
tional distance. Namely we require that A {px , Px) < £ — 5 where A{-,-) denotes the variational 
distance. 
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• Then, take a (5,OT)-code for the source px- Instantiating (1.2) with a = we find that there 
exists such a code as long as log 2 t« > //i/ 2 ( 2 f)p +log 2 

• Apply this code to a source with the distribution px instead, incurring a total error of at most 
5+A{px,Px) < e. (This uses the triangle inequality and the fact that the variational distance 
contracts when we process information through the encoder and decoder.) 

Hence, optimizing this over all such px, we find that there exists a (e,m)-code if 

log 2 m>//max^(X)p+log 2 ^, where , Hi/ 2 {X)p (1.6) 

o Px-^{Px,Px)<E' 


is the e'-smooth max-entropy, which is based on the Renyi entropy of order j. 

Furthermore, this bound is approximately optimal in the following sense. It can be shown [138] 
that all (£,OT)-codes must satisfy log 2 m > Hmax{X)p- Such bounds that give restrictions valid for 
all codes are called converse bounds. Rewriting this, we see that the minimal value of m for a 
given e, denoted OT*(e), satisfies 


HLx(X)p < log 2 m*(e) < inf 
Se{0,e) 


^max {X)p+ log2 


1 - 

5 


(1.7) 


We thus informally say that the memory required for one-shot source compression is charac¬ 
terized by the smooth max-Renyi entropy.^ 

Finally, we again consider the case of an iid source, and as before, we expect that in the limit 
of large n, the optimal compression rate im*(e) should be characterized by the Shannon entropy. 
This is in fact an expression of an entropic version of the asymptotic equipartition property, which 
states that 


\im-Hi^^{X^)p=H{Y)r for all e'e(0,l). 

n^oo fi ^ 


( 1 . 8 ) 


Why Shannon Entropy is Inadequate 

To see why the Shannon entropy does not suffice to characterize one-shot source compression, 
consider a source that produces the symbol ‘tt’ with probability 1/2 and k other symbols with 
probability I/2k each. On the one hand, for any fixed failure probability e ^ 1, the converse 
bound in (1.7) evaluates to approximately log 2 k. This implies that we cannot compress this source 
much beyond its Hartley entropy. On the other hand, the Shannon entropy of this distribution is 
5 (log 2 k-\-2) and underestimates the required memory by a factor of two. 


1.3 Outline of the Book 

The goal of this book is to explore quantum generalizations of the measures encountered in our 
example, namely the Renyi entropies and smooth entropies. Our exposition assumes that the 


^ The smoothing approach in the classical setting was first formally discussed in [141]. A detailed analysis of 
one-shot source compression, including quantum side information, can he found in [138]. 
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reader is familiar with basic probability theory and linear algebra, but not necessarily with quan¬ 
tum mechanics. For the most part we restrict our attention to physical systems whose observable 
properties are discrete, e.g. spin systems or excitations of particles bound in a potential. This 
allows us to avoid mathematical subtleties that appear in the study of systems with observable 
properties that are continuous. We will, however, mention generalizations to continuous systems 
where applicable and refer the reader to the relevant literature. 

The book is organized as follows: 

Chapter 2 introduces the notation used throughout the book and presents the mathematical 
framework underlying quantum theory for general (potentially continuous) systems. Our no¬ 
tation is summarized in Table 2.1 so that the remainder of the chapter can easily be skipped by 
expert readers. The exposition starts with introducing events as linear operators on a Hilbert 
space (Section 2.2) and then introduces states as functionals on events (Section 2.3). Multi¬ 
partite systems and entanglement is then discussed using the Hilbert space tensor product 
(Section 2.4) and hnally quantum channels are introduced as a means to study the evolu¬ 
tion of systems in the Schrodinger and Heisenberg picture (Section 2.6). Finally, this chapter 
assembles the mathematical toolbox required to prove the results in the later chapters, includ¬ 
ing a discussion of operator monotone, concave and convex functions on positive operators 
(Section 2.5). Most results discussed here are well-known and proofs are omitted. We do not 
attempt to provide an intuition or physical justihcation for the mathematical models employed, 
but instead highlight some connections to classical information theory. 

Chapter 3 treats norms and metrics on quantum states. First we discuss Schatten norms and a 
variational characterization of the Schatten norms of positive operators that will be very useful 
in the remainder of the book (Section 3.1). We then move on to discuss a natural dual norm for 
sub-normalized quantum states and the metric it induces, the trace distance (Section 3.2). The 
fidelity is another very prominent measure for the proximity of quantum states, and here we 
sensibly extend its to definition to cover sub-normalized states (Section 3.3). Finally, based on 
this generalized fidelity, we introduce a powerful metric for sub-normalized quantum states, 
the purified distance (Section 3.4). This metric combines the clear operational interpretation 
of the trace distance with the desirable mathematical properties of the hdelity. 

Chapter 4 discusses quantum generalizations of the Renyi divergence. Divergences (or rela¬ 
tive entropies) are measures of distance between quantum states (although they are not met¬ 
rics) and entropy as well as conditional entropy can conveniently be dehned in terms of the 
divergence. Moreover, the entropies inherit many important properties from corresponding 
properties of the divergence. In this chapter, we first discuss the classical special case of the 
Renyi divergence (Section 4.1). This allows us to point out several properties that we expect 
a suitable quantum generalization of the Renyi divergence to satisfy. Most prominently we 
expect them to satisfy a data-processing inequality which states that the divergence is con¬ 
tractive under application of quantum channels to both states. Based on this, we then explore 
quantum generalizations of the Renyi divergence and hnd that there is more than one quantum 
generalization that satisfies all desired properties (Section 4.2). 

We will mostly focus on two different quantum Renyi divergences, called the minimal and 
Petz quantum Renyi divergence (Sections 4. 3-4.4). The hrst quantum generalization is called 
the minimal quantum Renyi divergence (because it is the smallest quantum Renyi divergence 
that satishes a data-processing inequality), and is also known as “sandwiched” Renyi relative 
entropy in the literature. It has found operational signihcance in the strong converse regime 
of asymmetric binary hypothesis testing. The second quantum generalization is Petz’ quantum 
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Renyi relative entropy, which attains operational significance in the quantum generalization of 
Chemoff’s and Hoeffding’s bound on the success probability in binary hypothesis testing (cf. 
Section 7.1). 

Chapter 5 generalizes conditional Renyi entropies (and unconditional entropies as a special 
case) to the quantum setting. The idea is to define operationally relevant measures of uncer¬ 
tainty about the state of a quantum system from the perspective of an observer with access 
to some side information stored in another quantum system. As a preparation, we discuss 
how the conditional Shannon entropy and the conditional von Neumann entropy can be con¬ 
veniently expressed in terms of relative entropy either directly or using a variational formula 
(Section 5.1). Based on the two families of quantum Renyi divergences, we then define four 
families of quantum conditional Renyi entropies (Section 5.2). We then prove various prop¬ 
erties of these entropies, including data-processing inequalities that they directly inherit from 
the underlying divergence. A genuinely quantum feature of conditional Renyi entropies is the 
duality relation for pure states (Section 5.3). These duality relations also show that the four 
definitions are not independent, and thereby also reveal a connection between the minimal 
and the Petz quantum Renyi divergence. Furthermore, even though the chain rule does not 
hold with equality for our definitions, we present some inequalities that replace the chain rule 
(Section 5.4). 

Chapter 6 deals with smooth conditional entropies in the quantum setting. First, we discuss 
the min-entropy and the max-entropy, two special cases of Renyi entropies that underly the def¬ 
inition of the smooth entropy (Section 6.1). In particular, we show that they can be expressed 
as semi-definite programs, which means that they can be approximated efficiently (for small 
quantum systems) using standard numerical solvers. The idea is that these two entropies serve 
as representatives for the Renyi entropies with large and small a, respectively. We then define 
the smooth entropies (Section 6.2) as optimizations of the min- and max-entropy over a ball of 
states close in purified distance. We explore some of their properties, including chain rules and 
duality relations (Section 6.3). Finally, the main application of the smooth entropy calculus is 
an entropic version of the asymptotic equipartition property for conditional entropies, which 
states that the (regularized) smooth min- and max-entropies converge to the conditional von 
Neumann entropy for iid product states (Section 6.4). 

Chapter 7 concludes the book with a few selected applications of the mathematical concepts 
surveyed here. First, we discuss various aspects of binary hypothesis testing, including Stein’s 
lemma, the Chemoff bound and the Hoeffding bound as well as strong converse exponents 
(Section 7.1). This provides an operational interpretation of the Renyi divergences discussed 
in Chapter 4. Next, we discuss how the duality relations and the chain rule for conditional 
Renyi entropies can be used to derive entropic uncertainty relations — powerful manifesta¬ 
tions of the uncertainty principle of quantum mechanics (Section 7.2). Finally, we discuss 
randomness extraction against quantum side information, a premier application of the smooth 
entropy formalism that justifies its central importance in quantum cryptography (Section 7.3). 


What This Book Does Not Cover 

It is beyond the scope of this book to provide a comprehensive treatment of the many applications 
the mathematical framework reviewed here has found. However, in addition to Chapter 7, we will 
mention a few of the most important applications in the background section of each chapter. Tsal- 
lis entropies [162] have found several applications in physics, but they have no solid foundation in 
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information theory and we will not discuss them here. It is worth mentioning, however, that many 
of the mathematical developments in this book can be applied to quantum Tsallis entropies as 
well. There are alternative frameworks besides the smooth entropy framework that allow to treat 
unstructured resources, most prominently the information-spectrum method and its quantum gen¬ 
eralization due to Nagaoka and Hayashi [124]. These approaches are not covered here since they 
are asymptotically equivalent to the smooth entropy approach [45,157]. Finally, this book does 
not cover Renyi and smooth versions of mutual information and conditional mutual information. 
These quantities are a topic of active research. 



Chapter 2 

Modeling Quantum Information 


Classical as well as quantum information is stored in physical systems, or “information is in¬ 
evitably physical” as Rolf Landauer famously said. These physical systems are ultimately gov¬ 
erned by the laws of quantum mechanics. In this chapter we quickly review the relevant math¬ 
ematical foundations of quantum theory and introduce notational conventions that will be used 
throughout the book. 

In particular we will discuss concepts of functional and matrix analysis as well as linear algebra 
that will be of use later. We consider general separable Hilbert spaces in this chapter, even though 
in the rest of the book we restrict our attention to the finite-dimensional case. This digression is 
useful because it motivates the notation we use throughout the book, and it allows us to distinguish 
between the mathematical structure afforded by quantum theory and the additional structure that 
is only present in the finite-dimensional case. 

Our notation is summarized in Section 2.1 and the remainder of this chapter can safely be 
skipped by expert readers. The presentation here is compressed and we omit proofs. We instead 
refer to standard textbooks (see Section 2.7 for some references) for a more comprehensive treat¬ 
ment. 


2.1 General Remarks on Notation 

The notational conventions for this book are summarized in Table 2.1. The table includes refer¬ 
ences to the sections where the corresponding concepts are introduced. Throughout this book we 
are careful to distinguish between linear operators (e.g. events and Kraus operators) and func¬ 
tionals on the linear operators (e.g. states), which are also represented as linear operators (e.g. 
density operators). This distinction is inspired by the study of infinite-dimensional systems where 
these objects do not necessarily have the same mathematical structure, but it is also helpful in the 
finite-dimensional setting. ^ 

We do not specify a particular basis for the logarithm throughout this book, and simply use 
exp to denote the inverse of log.^ The natural logarithm is denoted by In. 

* For example, it sheds light on the fact that we use the operator norm for ordinary linear operators and its dual 
norm, the trace norm, for density operators. 

^ The reader is invited to think of log(x) as the binary logarithm of x and, consequently, exp{x) = 2^, as is customary 
in quantum information theory. 
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Symbol 

Variants 

Description 

Section 

M, C 

N 

R+ 

real and complex fields (and non-negative reals) 
natural numbers 


log, exp 

ln,e 

logarithm (to unspecified basis), and its inverse, the exponential function (nat¬ 
ural logarithm and Euler’s constant) 


(•U’> 


Hilbert spaces (for joint system Afi and system X) 
bra and ket 

2.2.1 

Tr(.) 

TrA 

trace (partial trace) 

2.3.1 



tensor product (n-fold tensor product) 

2.4.1 

e 

A^B 

AYB 


direct sum for block diagonal operators 

A is dominated by B, i.e. kernel of A contains kernel of B 

A and B are orthogonal, i.e. AB = BA = Q 

2.2.2 

if 

fif{A,B) 

bounded linear operators (from jYa to i^) 

2.2.1 

ift 

iff(fi) 

self-adjoint operators (acting on J^) 


SA 

{A>B} 

g^{CD) 

positive semi-definite operators (acting on Jfcfl) 
projector on subspace where A — B is non-negative 


ll’ll 


operator norm 

2.2.1 


if.(£) 

contractions in .5f (acting on J^e) 


SA. 

^.(A) 

contractions in (corresponding to events on A) 

2.2.2 

I 

Iy 

identity operator (acting on jYy) 


(t) 


Hilbert-Schmidt inner product 

2.3.1 

sr 


trace-class operators representing linear functionals 


y 


operators representing positive functionals 


II ■ II* 

Tr|.| 

trace norm on functionals 

2.3.1 



sub-normalized density operators (on A) 

2.3.2 


SAoiB) 

normalized density operators, or states (on B) 


It 

■Ka 

fully mixed state (on A), in finite dimensions 

2.3.2 

V 

Wab 

maximally entangled state (between A and B), in finite dimensions 

2.4.2 

CB 

CB{A,B) 

completely bounded maps (from .5? (A) to fif{B)) 

2.6.1 

CP 


completely positive maps 

2.6.2 

CPTP 

CPTNI 

completely positive trace-preserving (trace-non-increasing) map 


11-11 + 

Ml, 

positive cone dual norm (Schatten p-norm) 

3.1 

A{:-) 


generalized trace distance for sub-normalized states 

3.2 

n-:-) 

F*(;-) 

fidelity (generalized fidelity for sub-normalized states) 

3.3 

Pi;-) 


purified distance for sub-normalized states 

3.4 


^This equivalence only holds if the underlying Hilbert space is finite-dimensional. 
Table 2.1 Overview of Notational Conventions. 


We label different physical systems by capital Latin letters A, B, C, D, and E, as well as X, 
Y, and Z which are specifically reserved for classical systems. The label thus always determines 
if a system is quantum or classical. We often use these labels as subscripts to guide the reader 
by indicating which system a mathematical object belongs to. We drop the subscripts when they 
are evident in the context of an expression (or if we are not talking about a specific system). We 
also use the capital Latin letters L, K, H, M, and N to denote linear operators, where the last 
two are reserved for positive semi-definite operators. The identity operator is denoted I. Density 
operators, on the other hand, are denoted by lowercase Greek letters p, T, O’, and O). We reserve 
n and xf/ for the fully mixed state and the maximally entangled state, respectively. Calligraphic 
letters are used to denote quantum channels and other maps acting on operators. 
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2.2 Linear Operators and Events 

For our purposes, a physical system is fully characterized by the set of events that can be observed 
on it. For classical systems, these events are traditionally modeled as a CJ-algebra of subsets of 
the sample space, usually the power set in the discrete case. For quantum systems the structure of 
events is necessarily more complex, even in the discrete case. This is due to the non-commutative 
nature of quantum theory; the union and intersection of events are generally ill-defined since it 
matters in which order events are observed. 

Let us first review the mathematical model used to describe events in quantum mechanics 
(as positive semi-definite operators on a Hilbert space). Once this is done, we discuss physical 
systems carrying quantum and classical information. 


2.2.1 Hilbert Spaces and Linear Operators 

For concreteness and to introduce the notation, we consider two physical systems A and B as 
examples in the following. We associate to A a separable Hilbert space J^a over the field C, 
equipped with an inner product (•,•) : Ma x Ma —^ C. In the finite-dimensional case, this is 
simply a complex inner product space, but we will follow a tradition in quantum information 
theory and call Ma ^ Hilbert space also in this case. Analogously, we associate the Hilbert space 
Jifs to the physical system B. 


Linear Operators 

Our main object of study are linear operators acting on the system’s Hilbert space. We consis¬ 
tently use upper-case Latin letters to denote such linear operators. More precisely, we consider 
the set of bounded linear operators from M’a to Mb, which we denote hy ££(A, B). Bounded here 
refers to the operator norm induced by the Hilbert space’s inner product. 


The operator norm on .5^{A,B) is defined as 


ll'lh Li-Asup|^(Lv,Lv )5 : v £ (v,v)^ < l|. 

(2.1) 


For all L S ^{A,B), we have |jL|| < oo by definition. A linear operator is continuous if and 
only if it is bounded.^ Let us now summarize some important concepts and notation that we will 
frequently use throughout this book. 


^ Relation to Operator Algebras: Let us note that ^(A,B) with the norm || • || is a Banach space over C. Further¬ 
more, the operator norm satisfies 

||L||2 = l|L+||2 = j|Ll‘L|| and ||LJi:|| < ||L|| • ||/C||. (2.2) 

for any L 6 ^(A,B) and K 6 J^{B,A). The inequality states that the norm is sub-multiplicative. 

The above properties of the norm imply that the space £’{A) is (weakly) closed under multiplication and the 
adjoint operation. In fact, .5?(A) constitutes a (Type I factor) von Neumann algebra or C* algebra. Alternatively, we 
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• The identity operator on is denoted /^. 

• The adjoint of a linear operator L G is the unique operator G ^[B,A) that satisfies 

{w,Lv)b — {L^w,v)a for all v G vv G Mb- Clearly, = L. 

• For scalars a G C, the adjoint corresponds to the complex conjugate, = a. 

• We find {LKy = by applying the definition twice. 

• The kernel of a linear operator L G Jf(A,B) is the subspace of Ma spanned by vectors v G 
satisfying Lv = 0. The support of L is its orthogonal complement in Ma and the rank is the 
cardinality of the support. Finally, the image of L is the subspace of Mb spanned by vectors 
w G Mb such that w = Lv for some v G Ma- 

• For operators K,L G M{A) we say that L is dominated by K if the kernel of K is contained in 
the kernel of L. Namely, we write L <C A' if and only if 

• We say K,L G M{A) are orthogonal (denoted /T _L L) if KL = LK = 0. 

• We call a linear operator U G M{A,B) w isometry if it preserves the inner product, namely if 
{Uv,Uw)g = {v,w)j^ for all v,w G Ma- This holds if = Ia- 

• An isometry is an example of a contraction, i.e. an operator L G Jf(A,B) satisfying ||L|| < 1. 
The set of all such contractions is denoted Jf,(A,B). Here the bullet in the subscript of 
Jf,(A,B) simply illustrates that we restrict Jf(A,B) to the unit ball for the norm || • ||. 

For any L G M(A), we denote by its Moore-Penrose generalized inverse or pseudoin¬ 
verse [130] (which always exists in finite dimensions). In particular, the generalized inverse sat¬ 
isfies = L and If L = U, the generalized inverse is just the usual inverse 

evaluated on the operator’s support. 


Bras, Kets and Orthonormal Bases 

We use the bra-ket notation throughout this book. For any vector va G Ma, we use its ket, denoted 
|v)a, to describe the embedding 

|v)y^ I C —y -Ma, cc i —y ccva - (2.4) 

Similarly, we use its bra, denoted (v|a, to describe the functional 

{v\a-- Ma^C, wa^{v,w)j^. (2.5) 

It is natural to view kets as linear operators from C to Ma and bras as linear operators from 
Ma to C. The above definitions then imply that 

|Lv)^=L|v)^, {Lv\j^ = {v\j,V, and (v|^ = |y)jj . (2.6) 

Moreover, the inner product can equivalently be written as {w,Lv)g = (w|bL| v)a. Conjugate sym¬ 
metry of the inner product then corresponds to the relation 


could have started our considerations right here by postulating a Type 1 von Neumann algebra as the fundamental 
object describing individual physical systems, and then deriving the Hilbert space structure as a consequence. 
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(2.7) 


As a further example, we note that |v)a is an isometry if and only if (v|v)a = 1- 

In the following we will work exclusively with linear operators (including bras and kets) and 
we will not use the underlying vectors (the elements of the Hilbert space) or the inner product of 
the Hilbert space anymore. 

We now restrict our attention to the space .^{A) := ^(A,A) of bounded linear operators 
acting on An operator U € ^(A) is unitary if U and are isometries. An orthonormal 
basis (ONE) of the system A (or the Hilbert space is a set of vectors {oxjx, with ex € 
such that 


{,^x\ey')A — ^x,y •“ i ^ 52 \^x){ex\A — 7a • (2.8) 

X ^ y a: 

We denote the dimension of by c/a if it is finite and note that the index x ranges over c/a distinct 
values. For general separable Hilbert spaces x ranges over any countable set. (We do not usually 
specify such index sets explicitly.) Various ONBs exist and are related by unitary operators: if 
{ex}x is an ONE then {Uex}x is too, and, furthermore, given two ONEs there always exists a 
unitary operator mapping one basis to the other, and vice versa. 


Positive Semi-Definite Operators 

A special role is played by operators that are self-adjoint and positive semi-definite. We call an 
operator H G ^(A) self-adjoint if it satisfies H — H\ and the set of all self-adjoint operators in 
.if (A) is denoted ^'^(A). Such self-adjoint operators have a spectral decomposition, 

H = £Axlex}(exl ( 2 . 9 ) 

where {Xx}x C M are called eigenvalues and {\ex)}x is an orthonormal basis with eigenvectors 
\ex)- The set {Xx}x is also called the spectrum of H, and it is unique. 

Finally we introduce the set ^{A) of positive semi-definite operators in .if(A). An operator 
M G .if (A) is positive semi-definite if and only if M = UL for some L G .if (A), so in partic¬ 
ular such operators are self-adjoint and have non-negative eigenvalues. Let us summarize some 
important concepts and notation concerning self-adjoint and positive semi-definite operators here. 

• We call F G ii^(A) a projector if it satisfies — P, i.e. if it has only eigenvalues 0 and 1. The 
identity Ia is a projector. 

• For any K,L G .if^(A), we write K>LifK — LG lf^{A). Thus, the relation “>’ constitutes a 
partial order on .if (A). 

• For any G,H G .if^(A), we use {G > H} to denote the projector onto the subspace correspond¬ 
ing to non-negative eigenvalues of G — H. Analogously, {G < H}=I-{G > //} denotes the 
projector onto the subspace corresponding to negative eigenvalues of G — H. 
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Matrix Representation and Transpose 

Linear operators in ^{A,B) can be conveniently represented as matrices in x C'^^. Namely 
for any L G ^{A,B), we can write 

L = Y. mfy\BL\e.){eAA = E {fy\L\e.) ■ \fy){e.\, (2.10) 

where {ex}x is an ONB of A and {fy}y an ONB of B. This decomposes L into elementary operators 
\fy){ex\ S ^,{A,B) and the matrix with entries [L]yx = {fy\L\ex)- 

Moreover, there always exists a choice of the two bases such that the resulting matrix is diago¬ 
nal. For such a choice of bases, we hnd the singular value decomposition L = Y.\^x\fx){£x\, where 
{sx}x with ij: > 0 are called the singular values of L. In particular, for self-adjoint operators, we 
can choose \fx) = \ex) and recover the eigenvalue decomposition with Sx = |A;c|. 

The transpose of L with regards to the bases {ex} and {fy} is dehned as 

L^=E(/v|ik.)-kx)(/rl, L^€^{B,A) (2.11) 

Importantly, in contrast to the adjoint, the transpose is only dehned with regards to a particular 
basis. Also contrast (2.11) with the matrix representation of L}, 

((/rl^'k.))' ■ \ex){fy\ = E • k.)(/vl = ■ (2.12) 

x^y x,y 


Here, L denotes the complex conjugate, which is also basis dependent. 


2.2.2 Events and Measures 

We are now ready to attach physical meaning to the concepts introduced in the previous section, 
and apply them to physical systems carrying quantum information. 

Observable events on a quantum system A correspond to operators in the unit ball of 3^(A), 
namely the set 

^,{A):={MG.^{A): Q<M<I}. (2.13) 

(The bullet indicates that we restrict to the unit ball of the norm || • ||.) 

Two events M,A G 3^,(A) are called exclusive if M + N is an event in 3^t{A) as well. In this 
case, we call M-f A the union of the events M and N. A complete set of mutually exclusive events 
that sum up to the identity is called a positive operator valued measure (POVM). More generally, 
for any measurable space with E a cr-algebra, a POVM is a function 


Oa:E^ ^,{A) with Oa{33)=Ia 


(2.14) 
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that is (J-additive, meaning that = HiOAi^i) for mutually disjoint subsets ^ C 

This definition is too general for our purposes here, and we will restrict our attention to the case 
where 3^ is discrete and E the power set of In that case the POVM is fully determined if we 
associate mutually exclusive events to each x G . 

A function x Ma{x) with Ma{x) G IP,{A), Y.xMa{x) = Ia is called a positive operator 
valued measure (POVM) on A. 


We assume that x ranges over a countable set for this definition, and we will in fact not discuss 
measurements with continuous outcomes in this book. We call x i—^ Ma (x) a projective measure 
if all Ma(x) are projectors, and we call it rank-one if all Ma{x) have rank one. 


Structure of Classical Systems 

Classical systems have the distinguishing property that all events commute. 

To model a classical system X in our quantum framework, we restrict IP, (A) to a set of events 
that commute. These are diagonalized by a common ONE, which we call the classical basis of A. 
For simplicity, the classical basis is denoted {x}x and the corresponding kets are \x)^. (To avoid 
confusion, we will call the index y or z instead of x if the systems Y and Z are considered instead.) 

Every M G IP,{X) on a classical system can be written as 

M— 'Y^M{x)\x){x\^ = ^M{x), where 0<M{x)<l. (2.15) 

X X 

Instead of writing down the basis projectors, |x)(x|, we sometimes employ the direct sum no¬ 
tation to illustrate the block-diagonal structure of such operators. In the following, whenever we 
introduce a classical event M on X we also implicitly introduce the function M{x), and vice versa. 

This definition of “classical” events still goes beyond the usual classical formalism of discrete 
probability theory. In the usual formalism, M represents a subset of the sample space (an element 
of its (7-algebra), and thus corresponds to a projector in our language, with M{x) G {0,1} indi¬ 
cating if X is in the set. Our formalism, in contrast, allows to model probabilistic events, i.e. the 
event M occurs at most with probability M(x) G [0,1] even if the state is deterministically x.^ 


2.3 Functionals and States 

States of a physical system are functionals on the set of bounded linear operators that map events 
to the probability that the respective event occurs. Continuous linear functionals can be repre¬ 
sented as trace-class operators, which leads us to density operators for quantum and classical 
systems. 


^ This generalization is quite useful as it, for example, allows us to see the optimal (probabilistic) Neyman-Pearson 
test as an event. 
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2.3.1 Trace and Trace-Class Operators 

The most fundamental linear functional is the trace. For any orthonormal basis {exjx of A, we 
define the trace over A as 


Tr^(-): =S^(A)^C, ’^iexjLjex}^ . (2.16) 

Note that Tr(L) is finite if c/a <°° or more generally if L is trace-class. The trace is cyclic, namely 
we have 


TrA{KL)=Ti-B{LK) (2.17) 

for any two operators L G .^{A,B), K G .5^’{B,A) when KL and LK are trace-class. Thus, in 
particular, for any L G .^(A), we have TrA(T) = Tvb{ULU^) for any isometry U G .^{A,B), 
which shows that the particular choice of basis used for the definition of the trace in (2.16) is 
irrelevant. Finally, we have Tr(L^) = Tr(L). 


Trace-Class Operators 

Using the trace, continuous linear functionals can be conveniently represented as elements of the 
dual Banach space of .5?(A), namely the space of linear operators on .^a with bounded trace 
norm. 


The trace norm on .if (A) is defined as 

II-lU: ^^Tr|^|=Tr(^y^^ . ( 2 . 18 ) 

Operators ^ G if (A) with ||^ ||* < oo are called trace-class operators. 

We denote the subspace of if (A) consisting of trace-class operators by li^(A) and we use 
lower-case Greek letters to denote elements of ^(A). In infinite dimensions ,i7(A) is a proper 
subspace of if (A). In finite dimensions if (A) and i^(A) coincide, but we will use this convention 
to distinguish between linear operators and linear operators representing functionals nonetheless. 

For every trace-class operator ^ G SA(A), we define the functional F^{L) := {^,L) using the 
sesquilinear form 


(•,•): ,i7(A)xif(A)^C, (^,L)^Tr(^+L). (2.19) 

This form is continuous in both if (A) and li^(A) with regards to the respective norms on these 
spaces, which is a direct consequence of Holder’s inequality |Tr(i*^L)| < ||i*||* • ||L||.^ In finite 

^ Note also that the norms || • || and || • j|, are dual with regards to this form, namely we have 

||,^||, = sup{|(^,L)|:L6.i?’.(A)}. (2.20) 


The trace norm is thus sometimes also called the dual norm. 
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dimensions it is also tempting to view -Sf(A) = =T(A) as a Hilbert space with (•,•) as its inner 
product, the Hilbert-Schmidt inner product. Finally, positive functionals map onto the pos¬ 

itive reals. Since Tr(a)M) > 0 for all M > 0 if and only if w > 0, we find that positive functionals 
correspond to positive semi-definite operators in ^{A), and we denote these by ,5^(A). 


2.3.2 States and Density Operators 

A state of a physical system A is a functional that maps events M G i^,(A) to the respective 
probability that M is observed. We want the probability of the union of two mutually exclusive 
events to be additive, and thus such functionals must be linear. Furthermore, we require them to 
be continuous with regards to small perturbations of the events. Finally, they ought to map events 
into the interval [0,1], hence they must also be positive and normalized. 

Based on the discussion in the previous section, we can conveniently parametrize all function¬ 
als corresponding to states as follows. We define the set of sub-normalized density operators as 
trace-class operators in the unit ball, 

.9^,(A) := {pA e .^{A ): Pa > 0 A Tr(pA) < 1}. (2.21) 


Here the bullet refers to the unit ball in the norm || • ||*. (This norm simply corresponds to the 
trace for positive semi-definite operators.) 


For any operator Pa G oS^.(A), we define the functional 


Pr(-) : 3^,{A) ^[0,1], {pa,M) = Tr(pAM) 

, (2.22) 

which maps events to the probability that the event occurs. 


This is an expression of Born’s rule, and often taken as an axiom of quantum mechanics. Here 
it is just a natural way to map events to probabilities. We call such operators Pa density operators. 

It is often prudent to further require that the union of all events in a POVM, namely the event 
/, has probability 1. This leads us to normalized density operators: 

Quantum states are represented as normalized density operators in 


^o(A) := {pA G ^{A) : Pa > 0 A Tr(pA) = 1}, 

(2.23) 

(The circle ‘o’ indicates that we restrict to the unit sphere of the norm | 

l-ll*-) 


In the following we will use the expressions state and density operator interchangeably. We 
also use the set 5^ which contains all positive semi-definite operators, if there is no need for 
normalization. 

States form a convex set, and a state is called mixed if it lies in the interior of this set. The 
fully mixed state (in finite dimensions) is denoted tta := IaIAa- On the other hand, states on the 
boundary are called pure. Pure states are represented by density operators with rank one, and can 
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be written as (j)^ = |0)(0|^ for some (j) G With a slight abuse of nomenclature, we often call 
the corresponding ket, a state. 


Probability Mass Functions 

The structure of density operators simplihes considerably for classical systems. We are interested 
in evaluating the probabilities for events of the form (2.15). Hence, for any px G y'a{X), we hnd 

Pr(M) = Tr(pxM) = ^M(x) (x| px \x)x = Y,M{x)p{x), (2.24) 

^ X X 

where we dehned px{x) = (x| px \x)x- We thus see that it suffices to consider states of the follow¬ 
ing form; 

States Px G S^o{X) on a classical system X have the form 

Px =Y,Pi^)\x){x\x, where p(x)>0, ^p(x) = l. (2.25) 

X X 

where p (x) is called a probability mass function. 

Moreover, if px G is a sub-normalized density operator, we require that Y.xPi^) ^ 1 

instead of the equality. Again, whenever we introduce a density operator px on X, we implicitly 
also introduce the function p(x), and vice versa. 


2.4 Multi-Partite Systems 

A joint system AB is modeled using bounded linear operators on a tensor product of Hilbert 
spaces, ■= ® The respective set of bounded linear operators is denoted ^{AB) and 

the events on the joint systems are thus the elements of 3^t{AB). Analogously, all the other sets 
of operators dehned in the previous sections are dehned analogously for the joint system. 


2.4.1 Tensor Product Spaces 

For every v G J^b on the joint system AB, there exist two ONBs, {exjx on A and {fy}y on B, as 
well as a unique set of positive reals, {X,x}x, such that we can write 

k)4B = y^, \^x)A ® I/j)b ■ (2.26) 

This is called the Schmidt decomposition of v. The convention to use a square root is motivated by 
the fact that the sequence {v/X^lx is square summable, i.e. Y.x^x < Note also that {ex^fy}x,y 
can be extended to an ONB on the joint system AB. 
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Embedding Linear Operators 

We embed the bounded linear operators ^(A) into J^(AB) by taking a tensor product with the 
identity on B. We often omit to write this identity explicitly and instead use subscripts to indicate 
on which system an operator acts. For example, for any La G ^(A) and \v)ab G Mab as in (2.26), 
we write 


®Ib |t')AB — \^x)a ® \ f^B (2.27) 

Clearly, \\La‘^Ib\\ = II^-aII^ and in fact, more generally for all La G ^(A) and Lg G Jf(B), we 
have 


||La 0^511 = ||iAll-||iB||. (2.28) 

We say that two operators K,LG .^{A) commute if [K^L\ := KL — LK = 0. Clearly, elements of 
.if (A) and ^{B) mutually commute as operators in ^(AB), i.e. for all La G -Sf(A), Kg G ^{B), 
we have [La®IbJa®Kb\ — 0. 

Finally, every linear operator Lab G ^{AB) has a decomposition 

LAB^J^Li(^Ll where 4 S if (A), 4 S .if (B) (2.29) 

k 

Similarly, every self-adjoint operator Lab G if ^ (AB) decomposes in the same way but now 4 G 
if^(A) and G if^(B) can be chosen self-adjoint as well. However, crucially, it is not always 
possible to decompose a positive semi-definite operator into products of positive semi-definite 
operators in this way. 


Representing Traces of Matrix Products Using Tensor Spaces 

Let us next consider trace terms of the form Ti-aILaLa) where La, La G if (A) are general linear 
operators and is finite-dimensional. It is often convenient to represent such traces as follows. 

First, we introduce an auxiliary system A' such that and are isomorphic (i.e. they have 
the same dimension). Furthermore, we fix a pair of bases {\ex)A\x of A and {\ex)Ai}x of A'. (We 
can use the same index set here since these spaces are isomorphic.) Clearly every linear operator 
on A has a natural embedding into A' given by this isomorphism. Using these bases, we further 
define a rank one operator 'B G ^ (AA') in its Schmidt decomposition as 


\'I')aa'=L\^)a^\^)a'- ( 2 - 30 ) 

a: 

(Note that this state has norm ||'B||* = d-A, which is why this discussion is restricted to finite 
dimensions.) Using the matrix representation of the transpose in (2.11), we now observe that 
La® I A' |f^)AA' = ^a®L\, \'B)aa' therefore. 


Tx{KaLa) = mKALA If') = mAA'KA®Ll, \'¥)aa> . 


(2.31) 
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We will encounter this representation many times and keep f' thus reserved for this purpose, 
without going through the construction explicitly every time.® 


Marginals of Functionals 

Given a bipartite system AB that consists of two sets of operators ^(A) and ^(B), we now want 
to specify how a trace-class operator G 3'{AB) acts on ^(A). For any G ^{A), we have 

~ =T'r = TrA (Trs (2.32) 

where we simply used that TrAB(-) = TrA(TrB(-)) where Tr^ as dehned in (2.16) naturally embeds 
as a map from ^(AB) into ^ (A), i.e. 

TrB(XAB) = 52 ( (gIa ®^b)^Ab( kA:)A • (2.33) 

a: 

This is also called the partial trace and will be discussed further in the context of completely 
bounded maps in Section 2.6.2. 

The above discussion allows us to dehne the marginal on A of the trace-class operator ^ab G 
£^{A) as follows: 


^A := Trs (^ab) such that {La) = {La) = {^a,La) ■ (2.34) 

We usually do not introduce marginals explicitly. For example, if we introduce a trace-class op¬ 
erator ^ab then its marginals ^a and are implicitly dehned as well. 


2.4.2 Separable States and Entanglement 

The occurrence of entangled states on two or more quantum systems is one of the most intriguing 
features of the formalism of quantum mechanics. 

We call a positive operator Mab S 3^{AB) of a joint quantum system AB separable if it can 
be written in the form 

Mab = 52 ^A{k)®KB{k), where La(^) G ^{A), KB{k) G ^{B), (2.35) 

keJL 

for some index set Otherwise, it is called entangled. 


The prime example of an entangled state is the maximally entangled state. For two quantum 
systems A and B of hnite dimension, a maximally entangled state is a state of the form 


^ Note that 'F is an (unnormalized) maximally entangled state, usually denoted y/- 
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W)ab = 



\£x)a ® I/jc)b ! 


d = mi\\{dA,dB} 


(2.36) 


where {e^jx is an ONB of A and {fx}x is an ONB of B. 

This state cannot be written in the form (2.35) as the following argument, due to Peres [131] 
and Horodecki [89], shows. Consider the operation (•)^® of taking a partial transpose on the 
system B with regards to to {fx}x on B. Applied to separable states of the from (2.35), this always 
results in a state, i.e. 

= >0. (2.37) 

k 

is positive semi-dehnite. Applied to y/AB, however, we get 

^ E \‘^x){e^ I 0 (|A)(/y I)^ ^ ^ Vx){ex' I O |/y)(Al • (2.38) 

x,x' x,x' 

This operator is not positive semi-dehnite. For example, we have 

(^I’/abI^) =where |0) = |ei) 0 ki) - k 2 ) O ki) . (2.39) 

Generally, we have seen that a bipartite state is separable only if it remains positive semi- 
dehnite under the partial transpose. The converse is not true in general. 


2.4.3 Purification 

Consider any state Pab G ’^{AB), and its marginals Pa and Pb- Then we say that Pab is an exten¬ 
sion of Pa and p^. Moreover, if Pab is pure, we call it a purification of Pa and Pb- Moreover, we 
can always construct a purihcation of a given state Pa G (A). Let us say that Pa has eigenvalue 
decomposition 

PA=Y,^x\ex){ex\A , then the state |p), 4 ^/= E \/^k^)A® k^)A' (2-40) 

X X 

is a purification of Pa- Here, A' is an auxiliary system of the same dimension as A and {\ex)A'}x 
is any ONB of A'. Clearly, Tr, 4 /(p^/) = Pa- 


2.4.4 Classical-Quantum Systems 

An important special case are joint systems where one part consists of a classical system. Events 
M G ^t{XA) on such joint systems can be decomposed as 

Mxa = E k)('^lx ® AIa (x) = 0Ma(x), where Ma (x) G ^,(A) . 


(2.41) 
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Moreover, we call states of such systems classical-quantum states. For example, consistent 
with our notation for classical systems in (2.25), a state pxA G S^,{XA) can be decomposed as 

PxA = ^ ® Pa(x), where Pa{x)>Q, ^Tr (pz(x)) < 1. (2.42) 

X X 

Clearly, Pa^x) G is a sub-normalized density operator on A. Furthermore, comparing 

with (2.35), it is evident that such states are always separable. 

If PxA G =5^0 (2fA), it is sometimes more convenient to instead further decompose 

Pa (x) = p {x)pa (x) , (2.43) 

where p(x) is a probability mass function and Pa{x) G ^o(A) normalized as well. 


2.5 Functions on Positive Operators 

Besides the inverse, we often need to lift other continuous real-valued functions to positive semi- 
definite operators. For any continuous function / : K+ \ {0} —?> K and M G l^(A), we use the 
convention 


fm= E f(^s)le,)(e,l. (2.44) 

if the resulting operator is bounded (e.g. if the spectrum of M is compact). That is, as for the 
generalized inverse, we simply ignore the kernel of M.^ By definition, we thus have f{UMU^) = 
Uf{M)U^ for any unitary U. Moreover, we have 

Lf{L^L)=f{LL^)L, (2.45) 

which can be verified using the polar decomposition, stating that we can always write L = U\L\ 
for some unitary operator U. An important example is the logarithm, defined as logM = 

\ex){ex\- 

Let us in the following restrict our attention to the finite-dimensional case. Notably, trace 
functionals of the formM i—> Tr(/(M)) inherit continuity, monotonicity, concavity and convexity 
from / (see, e.g., [34]). For example, for any monotonically increasing continuous function /, we 
have 


Tr(/(M)) < Tr(/(A)) for all M,N G 3^{A) with M<N. (2.46) 


Operator Monotone and Concave Functions 

Here we discuss classes of functions that, when lifted to positive semi-definite operators, retain 
their defining properties. A function / : K+ —>^ M is called operator monotone if 


^ This convention is very useful to keep the presentation in the following chapters concise, but some care is 
required. If limE^o /(r) 'A hj then M /(M) is not necessarily continuous even if / is continuous on its support. 
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M<N f{M) < f{N) for all M,N > 0. (2.47) 

If / is operator monotone then —/ is operator anti-monotone. Furthermore, / is called operator 
convex if 


Xf{M) + {l-l)f{N)>f{lM+{l-l)N) for all M,N>Q (2.48) 

and A G [0,1]. If this holds with the inequality reversed, then the function is called operator 
concave. These definitions naturally extend to functions / ; (0,°°) —?> K, where we consequently 
choose M.N >0. 

There exists a rich theory concerning such functions and their properties (see, for example, 
Bhatia’s book [26]), but we will only mention a few prominent examples in Table 2.2 that will be 
of use later. 


function 

range 

op. monotone 

op. anti-monotone 

op. convex 

op. concave 

Vi 

[0,c») 

yes 

no 

no 

yes 


[0, oo) 

no 

no 

yes 

no 


(0,oo) 

no 

yes 

yes 

no 



a 6 [0,1] 

a 6 [-1,0) 

a6[-l,0)U[l,2] 

a 6 (0,1] 

logf 

(0.“) 

yes 

no 

no 

yes 

tlogf 

[0,oo) 

no 

no 

yes 

no 


Table 2.2 Examples of Operator Monotone, Coneave and Convex Funetions. Note in particular that t“ is 
neither operator monotone, convex nor concave for a < — 1 and a > 2. 


We say that a two-parameter function is jointly concave (jointly convex) if it is concave (con¬ 
vex) when we take convex combinations of input tuples. Lieb [106] and Ando [4] established the 
following extremely powerful result. The map 

^{A)x ^{B)^ ^(AB), {Ma,Nb)^ f{MA®Njj^)MA®lB (2.49) 

is jointly convex on strictly positive operators if / : (0,°°) —> K is operator monotone. This is 
Ando’s convexity theorem [4]. In particular, we hnd that the functional 

{Ma,Nb) ^ {^\K- {Ma®N^:^Y^^Ma-K^^)„b'=^’^a{.M^K^nI-^K) (2.50) 

for any fC G .^{A,B) is jointly concave for a C (0,1) and jointly convex for a € (1,2). The former 
is known as Lieb’s concavity theorem. Since this will be used extensively, we include a derivation 
of this particular result in Appendix A. 


2.6 Quantum Channels 


Quantum channels are used to model the time evolution of physical systems. There are two equiv¬ 
alent ways to model a quantum channel, and we will see that they are intimately related. In the 
Schrodinger picture, the events are hxed and the state of a system is time dependent. Conse¬ 
quently, we model evolutions as quantum channels acting on the space of density operators. In 
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the Heisenberg picture, the observable events are time dependent and the state of a system is fixed, 
and we thus model evolutions as adjoint quantum channels acting on events. 


2.6.1 Completely Bounded Maps 

Here, we introduce linear maps between bounded linear operators on different systems, and their 
adjoints, which map between functionals on different systems. For later convenience, we use 
calligraphic letters to denote the latter maps, for example £ and IF and use the adjoint notation 
for maps between bounded linear operators. The action of a linear map on an operator in a tensor 
space is well-defined by linearity via the decomposition in (2.29), and as for linear operators, we 
usually omit to make this embedding explicit. 

The set of completely bounded (CB) linear maps from .if (A) to ^{B) is denoted by CB(A,Z?). 
Completely bounded maps £^ C CB(A,B) have the defining property that for any operator Lac G 
.if (AC) and any auxiliary system C, we have ||£^(Lac)|| < We then define the linear map 
£ from 3L[A) to 3"{B) as the adjoint map for some £^ G CB(B,A) via the sesquilinear form. 
Namely, £ is defined as the unique linear map satisfying 

(£(^),L) = (^,£■'■(£)) for all ^ G ,f^(A), L e if(B). (2.51) 

Clearly, £ maps (A) into (B). Moreover, for any ^ac in we have 

I|£(^ac)II* = sup{|(^ac,£^(£bc))| : Bbc G if.(BC)} < (2.52) 

So these maps are in fact completely bounded in the trace norm and we collect them in the set 
CB*(A,B). Again, in finite dimensions CB(A,B) and CB*(A,B) coincide. 


2.6.2 Quantum Channels 

Physical channels necessarily map positive functionals onto positive functionals. A map £ G 
CB*(A,B) is called completely positive (CP) if it maps i^(AC) to i^(BC) for any auxiliary system 
C, namely if 


{L{coac),Mbc) > 0 for all o) G =5^(AC), M G ^{BC). (2.53) 

A map £ is CP if and only if £’^ is CP, in the respective sense. The set of all CP maps from S' (A) 
to is denoted CP(A,B). 

Physical channels in the Schrodinger picture are modeled by completely positive trace¬ 
preserving maps, or quantum channels. 


** It is noteworthy that the weaker condition that the map be bounded, i.e. || £^{L/i)j| < is not sufficient here and 
in particular does not imply that the map is completely bounded. In contrast, bounded linear operators in .5f(A) 
are in fact also completely bounded in the above sense. 
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A quantum chaunel is a map £ G CP(A,B) that is trace-preserving, namely a map that 
satisfies 

Tr(£(^))=Tr(^) for all (2.54) 

Naturally, such maps take states to states, more precisely, they map S^o(A) to ^o{B) and =5^.(A) 
to The corresponding adjoint quantum channel £^ from .if (B) to .if (A) in the Heisenberg 

picture is a completely positive and unital map, namely it satisfies £^(/a) = h- In fact, a map £ 
is trace-preserving if and only if £^ is unital. Unital maps take ^,{B) to 3^,(A) and thus map 
events to events. Clearly, 

Pr (M) = (£(p),M) = (p,£+(M))=Pr(£^(M)). (2.55) 

e(p) p 

Let us summarize some further notation: 

• We denote the set of all completely positive trace-preserving (CPTP) maps from .^(A) to 
3^{B) by CPTP(A,B). 

• The set of all CP unital maps from .if (A) to .if (B) is denoted CPU(A,B). 

• Finally, a map £ G CP(A,B) is called trace-non-increasing if Tr(£(a))) < Tr(a)) for all co G 
S^{A). A CP map is trace-non-increasing if and only if its adjoint is sub-unital, i.e. it satisfies 


Some Examples of Channels 

The simplest example of such a CP map is the conjugation with an operator L G J^'{A,B), that 
is the map C : i—> L^L^. We will often use the following basic property of completely positive 
maps. Let £ G CP(A,B), then 

^>C £(^)>£(C) forall ^,CG77(A). (2.56) 

As a consequence, we take note of the following property of positive semi-definite operators. 
For any M G i^{A), ^ G =5^(A), we have 

Tr(^M) = Tr s/m) > 0, (2.57) 

where the last inequality follows from the fact that the conjugation with s/M is a completely 
positive map. In particular, if L,K & ■^(^) satisfy L>K,-we find Tr(i*L) > Tr(^B'). 

An instructive example is the embedding map La®Ib, which is completely bounded, CP 
and unital. Its adjoint map is the CPTP map Tr^, the partial trace, as we have seen in Section 2.4.1 . 
Finally, for a POVM x i—> Ma{x), we consider the measurement map M G CPTP (A, X) given by 

Tr(pAAfA W) • 


(2.58) 
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This maps a quantum system into a classical system with a state corresponding to the probability 
mass function p(x) = Tr(p^M^(x)) that arises from Born’s rule. If the events {M^(x)}j: are rank- 
one projectors, then this map is also unital. 


2.6.3 Pinching and Dephasing Channels 

Pinching maps (or channels) constitute a particularly important class of quantum channels that 
we will use extensively in our technical derivations. A pinching map is a channel of the form 
y : L i—i' Y.xPxLPx where {Px]x, x G [m\ are orthogonal projectors that sum up to the identity. 
Such maps are CPTP, unital and equal to their own adjoints. Alternatively, we can see them 
as dephasing operations that remove off-diagonal blocks of a matrix. They have two equivalent 
representations: 


■ I 1 ■ I j. ■ I 2 jr lyx 

y{L) = ^ PxLPx = ^ L where Uy= Y. Px (2.59) 

x€ [m] y€ [m] x€ [m] 

are unitary operators. Note also that Um = I- 

For any self-adjoint operator// S with eigenvalue decomposition// = Y,x^x\ex){ex\^ we 

define the set spec(//) = {X,x}x and its cardinality, | spec(//) |, is the number of distinct eigenvalues 
of H. For each A S spec(//), we also define P^ = Y.x:X-=X \^x){ex\ such that H — ^Px 
spectral decomposition. Then, the pinching map for this spectral decomposition is denoted 

Vh-.L^ Y PxLPx- (2.60) 

A€spec(//’) 

Clearly, yniH) = H, yniL) commutes with PI, and Tr(T//(L)//) = Tr(L//). 

For any M G using the second expression in (2.59) and the fact that UxMU^ > 0, we 

immediately arrive at 


yniM) 


1 

|spec(//)| 


Y UyMUj. > 

yc[m] 


1 

1-:—vrAf. 

|spec(//)| 


This is Hayashi’s pinching inequality [74]. 

Finally, if / is operator concave, then for every pinching T, we have 


f[y{M)) = f 



= ^1. Uxf{M)ul=y{f{M))- 

xCl [m] 


(2.61) 


(2.62) 

(2.63) 


This is a special case of the operator Jensen inequality established by Hansen and Pedersen [71]. 
For all H G every operator concave function / defined on the spectrum of //, and all 

unital maps £ G CPU(A,/?), we have 


/(£(//))>£(/(//)). 


(2.64) 
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2.6.4 Channel Representations 

The following representations for trace non-increasing and trace preserving CP maps are of cru¬ 
cial importance in quantum information theory. 


Kraus Operators 

Every CP map can be represented as a sum of conjugations of the input [82,83]. More precisely, 
£ G CP(A,B) if and only if there exists a set of linear operators {£/;}/;, G Jf(A,B) such that 

£(^) = for all (2.65) 

k 

Furthermore, such a channel is trace-preserving if and only if Y^k^k^k = t. and trace-non- 
increasing if and only if YkE^E^. < I. The operators {E/;} are called Kraus operators. Moreover, 
the adjoint of £ is completely positive and has Kraus operators {El} since 

Tr(^£+(L)) =Tr(£(^)L) =Tr(^^£,. (2.66) 

^ k ' 


Stinespring Dilation 

Moreover, every CP map can be decomposed into its Stinespring dilation [147]. That is, £ G 
CP(A,B) if and only if there exists a system C and an operator L G Jf(A,BC) such that 

£(<^) =Trc(L<^L^) for all (2.67) 

Moreover, if £ is trace-preserving then L = U, where U G Jl't{A,BC) is an isometry. If £ is trace- 
non-increasing, then L — PU is an isometry followed by a projection P G 


Choi-Jamiolkowski Isomorphism 

For finite-dimensional Hilbert spaces, the Choi-Jamiolkowski isomorphism [96] between bounded 
linear maps from A to B and linear functionals on A'B is given by 

r: ^i^iA),J^{B))^^{A’B), £^7f,5 = £(|'P)('PU,^), (2.68) 

where the state is called the Choi-Jamiolkowski state of £. The inverse operation, maps 

linear functionals to bounded linear maps 

r-': 7A'B^{£^:pA^Tr^'(7A'B(/B®pl0)}, (2-69) 

where the transpose is taken with regards to the Schmidt basis of T'. 

There are various relations between properties of bounded linear maps and properties of the 
corresponding Choi-Jamiolkowski functionals, for example; 
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£ is completely positive <1= 

> 0, 

(2.70) 

£ is trace-preserving <;= 

TrB(7f,5) =/4^ 

(2.71) 

£ is unital <;= 


(2.72) 


2.7 Background and Further Reading 

Nielsen and Chuang’s book [125] offers a good introduction to the quantum formalism. Hayashi’s [75] 
and Wilde’s [174] books both also carefully treat the concepts relevant for quantum information 
theory in finite dimensions. Finally, Holevo’s recent book [88] offers a comprehensive mathemat¬ 
ical introduction to quantum information processing in finite and infinite dimensions. 

Operator monotone functions and other aspects of matrix analysis are covered in Bhatia’s 
books [26,27], and Hiai and Petz’ book [87]. 


Chapter 3 

Norms and Metrics 


In this chapter we equip the space of quantum states with some additional structure by discussing 
various norms and metrics for quantum states. We discuss Schatten norms and an important vari¬ 
ational characterization of these norms, amongst other properties. We go on to discuss the trace 
norm on positive semi-dehnite operators and the trace distance associated with it. Uhlmann’s fi¬ 
delity for quantum states is treated next, as well as the purihed distance, a useful metric based on 
the hdelity. 

Particular emphasis is given to sub-normalized quantum states, and the above quantities are 
generalized to meaningfully include them. This will be essential for the dehnition of the smooth 
entropies in Chapter 6. 


3.1 Norms for Operators and Quantum States 

We restrict ourselves to hnite-dimensional Hilbert spaces hereafter. We start by giving a formal 
dehnition for unitarily invariant norms on linear operators. An example of such a norm is the 
operator norm || • || of the previous chapter. 


Definition 3.1. A norm for linear operators is a map H-H : ^(A) — > [0,°°) which satisfies 
the following properties, for any L^K G .if (A). 

Positive-definiteness: ||L|| > 0 with equality if and only if L = 0. 

Absolute scalability: ||aL|| = |a| • ||L|| for all a G C. 

Subadditivity: |jL-|-lir|j < ||L|| -|- \\K\\. 

A norm |||’||| is called a unitarily invariant norm if it further satisfies 
Unitary invariance: |||t/Ly^||| = |||L||| for any isometries U,V G 

We reserve the notation |||'||| for unitarily invariant norms. Combining subadditivity and scala¬ 
bility, we note that norms are convex: 

||AL-I-(1-1)/:|| <l||L||-f (1-A)|j/:|| for all Ag[0,1]. (3.1) 


31 



32 


3 Norms and Metrics 


3.1.1 Schatten Norms 

The singular values of a general linear operator LG .Sf {A) are the eigenvalues of its modulus, the 
positive semi-definite operator \L\ := VUl. The Schatten p-norm of L is then simply defined as 
the p-norm of its singular values. 

Definition 3.2. For any L G ^(A),sn& define the Schatten p-norm of L as 

||L||p:= (Tr(|Lr))'^ for p>l. (3.2) 


We extend this definition to all p > 0, but note that in this case ||T||p is not a norm. In particular, 
\L\p for pG [0,1) does not satisfy the subadditivity inequality in Definition 3.1. The operator norm 
is recovered in the limit p —> oo. We have 

|!L|U = ||L||, ||L||2 = ^Tr(LtL), |iL|li = Tr|L| = ||L||.. (3.3) 

The latter two norms are the Frobenius or Hilbert-Schmidt norm and the trace norm. 

The Schatten norms are unitarily invariant and subadditive. Using this and the representation 
of pinching channels in (2.59), we find 


mm 


E -U.LU, 

x€i\m\ 


< E -|||t4Lt/;i|| = iiiL|ii. 

x€ [m\ 


This is called the pinching inequality for (unitarily invariant) norms. 


(3.4) 


Holder Inequalities and Variational Characterization of Norms 

Next we introduce the following powerful generalization of the Holder and reverse Holder in¬ 
equalities to the trace of linear operators: 


Lemma 3.1. Let L,K G -Sf(A), M,N G 1^{A) and p,q G K such that p > 0 and ^ ^ = 1- 

Then, we have 

|Tr(Lfi:)|<Tr|L/:|<||L||p.||/:||, if P > I (3.5) 

Tr(MN) > ||M|ip- if p G {0,1) andM N. (3.6) 

Moreover, for every L there exists a K such that equality is achieved in (3.5). In particular, 
for M,N G lf^{A), equality is achieved in all inequalities if = aN‘^ for some constant 
a>0. 


Proof We omit the proof of the first statement (see, e.g., Bhatia [26, Cor. IV.2.6]). 

For p G (0,1), let us first consider the case where M and N commute. Then, (3.5) yields 
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\\M\\P = Tx{MP) = Tx{MPNPN-P) < ||MW|| i • ||A^^p|| _l (3.7) 

' p i-p 

= {TiiMN)Y-(Tr{\N\-^p)y^", (3.8) 

which establishes the desired statement. To generalize (3.6) to non-commuting operators, note 
that the commutative inequality yields 

Tr(MiV) =Tr(T;v(M)iV) > ||Tiv(M) • || |iVr' ||:‘. (3.9) 

Moreover, since 1fP is operator concave, the operator Jensen inequality (2.64) establishes that 
||Tyv(M)|||; = Tr((T^(M))'’) >Tr(T^(M'’)) =Tr(M'’). (3.10) 

Substituting this into (3.9) yields the desired statement for general M and N. □ 


These Holder inequalities are extremely useful, for example they allow us to derive various 
variational characterizations of Schatten norms and trace terms. For p > 1, the Holder inequality 
implies norm duality, namely [26, Sec. IV.2] 


||L||n= max |Tr(L^Jir)| for —I— = l,p,q>l. 

^ KGjif{A) ' '' '' P q 


(3.11) 


This is a quite useful variational characterization of the Schatten norm, which we extend to p G 
(0,1) using the reverse Holder inequality. Here we state the resulting variational formula for 
positive operators. 

Lemma 3.2. Let M {A) and p > 0. Then, for r = 1 — we find 

|jM||p = max|Tr(MA?'') ;A?e,y’o(A)| (fp^l (3.12) 

||M||p=min|Tr(MA?'') :A?G J^o(A) A ifp&{Q,\]. (3.13) 

Furthermore, as a consequence of the Holder inequality for p > 1 we find 

logTr(MA?) < - logTr(MP) + - logTr(A?^) (3.14) 

p q 

<log(^Tr(MP) + ^Tr(A?^)) , (3.15) 

where the last inequality follows by the concavity of the logarithm. Hence, we have 


Tr{MN) < - Tr(M^) + - Tr(A?'?) with equality iff MP =N‘‘, 
p q 


(3.16) 


which is a matrix trace version of Young’s inequality. Similarly, the reverse Holder inequality for 
p G (0,1) and M <^N yields again (3.16) with the inequality reversed. 
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3.1.2 Dual Norm For States 

We have already encountered the norm || • ||*, which is the dual norm of the operator norm on 
linear operators. Given the operational relation between density operators (positive functionals) 
and events (positive semi-definite operators), it is natural to consider the following dual norm on 
positive functionals; 

Definition 3.3. We define the positive cone dnal norm as 

11-11+: >-K+, (0 1 -^ max |Tr(a)M)|. (3.17) 


Here we emphasize that the maximization in the definition of the dual norm is only over events in 
In fact, optimizing over operators in ^,(A) in the above expression yields the Schatten-1 
norm as we have seen in (3.11). Thus, we clearly have ||^ ll+< ll^lli- 

Let us verify that this is indeed a norm according to Definition 3.1. (However, it is not unitarily 
invariant.) 

Proof. From the definition it is evident that ||a^||+ = |a| • ||^|| for every scalar a S C. Further¬ 
more, the triangle inequality is a consequence of the fact that 


ll^ + CI 


max \Tx {{t + C)M)\ < max 
Me^.{A)' ' Me&’.{A) 


Tr(^M) 


-I- max |Tr(CM) 
Me^.(A) 


II^II+ + IICI1+- 

(3.18) 


for every ^, C G It remains to show that 11 ^ 11+ >0 with equality if and only if ^ = 0. This 
follows from the following lower bound on the dual norm: 

11^11+> max |(v|^ |v)| = w(^) > 0 with equality only if ^ = 0. (3.19) 

|v):(v|v) = l 

To arrive at (3.19), we chose M = |v)(v| and let w(-) denote the numerical radius (see, e.g., Bha- 
tia [26, Sec. 1.1]). The equality condition is thus inherited from the numerical radius. □ 

For functionals represented by self-adjoint operators ^ G we can explicitly find the 

operator that achieves the maximum in (3.17) using the spectral decomposition of ^. Specifically, 
we find that the expression is always maximized by the projector > 0} or its complement 
< 0}, namely we want to either sum up all positive or all negative eigenvalues to maximize 
the absolute value. The dual norm thus evaluates to 


11^11+= max{Tr ({^ > 0}^), -Tr ({^ < 0}^)} . 


This can be further simplified using maxja,fi} = j{a + b+\a — b\), which yields 

11^11+= iTr(({^>0}-{^<0})^)-fi|Tr(({^>0} + {^<0})^ 

= iTr|^|-fi|Tr(^)| = i||^||i + i|Tr(^)|. 


(3.20) 

(3.21) 

(3.22) 


Finally, for positive functionals this further simplifies to ||ft)|| + = ||w||i =Tr(a)). 
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3.2 Trace Distance 

We start by introducing a straightforward generalization of the trace distance to general (not 
necessarily normalized) states. The definition also makes sense for general trace-class operators, 
so we will state the results in their most general form. 

Definition 3.4. For we define the generalized trace distance between ^ and 

Cas4(^,C):=ll^-CII + . 


This distance is also often called total variation distance in the classical literature. It is a metric 
on ^{A), an immediate consequence of the fact that || • |j+ is a norm. 

Definition 3.5. A metric is a functional ^{A) x ^(A) — > R+ with the following properties. 

For any ^,^,K € ^{A), it satisfies 

Positive-definiteness: >0 with equality if and only if ^ = i^. 

Symmetry: A{^,Q = 

Triangle inequality: A{^,Q < A{^,k)+A{k, Q. 


When used with states, the generalized trace distance can be expressed in terms of the trace norm 
and the absolute value of the trace using (3.22). This yields 

4(P.'!^) = ^l|p-'!^lli + ^|Tr(p-T)| . (3.23) 

Hence the definition reduces the usual trace distance A{p,z) = j||p — T|ji in case both density 
operators have the same trace, for example if p,T € ^o{A). More generally, for sub-normalized 
states in o5^,(A), we can express the generalized trace distance as 

^(P:T) = ^||p-f||i =A(p,f), (3.24) 

where p = p 0 (1 — Tr(p)) and f = T0 (1 — Tr(T)) are block-diagonal. We will use the hat 
notation to refer to this construction in the following. 

For normalized states p, T S S^o{A), this definition expresses the distinguishing advantage in 
binary hypothesis testing. Let us consider the task of distinguishing between two hypotheses, p 
and T, with uniform prior using a single observation. For every event M G i^,(A), we consider 
the following strategy: we perform the POVM — and select p in case we measure M 
and T otherwise. Optimizing over all strategies, the probability of selecting the correct state can 
be expressed in terms of the distinguishing advantage, A (p, t), as follows: 

Pcon(p,T):= max ^Tr(pM)0^Tr(T(/-M))^ = ^(10 A(p,t)). (3.25) 

Me^.(A) \2 2 / 2 ' 

Like any metric based on a norm, the generalized trace distance is also jointly convex. For all 
A G [0,1], we have 
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4(Api + (1-A)p2,Ati + (1-A)t 2) < A4(pi,Ti) + (l-A)zi(p2,T2). (3.26) 

Moreover, the generalized trace distance contracts when we apply a quantum channel (or any 
trace-non-increasing completely positive map) on both states. 


Proposition 3.1. Let C € let 3^ G CPTNI(A,B) be a trace-non-increasing CP 

map. Then, <A{^,Q. 


Proof. Note that if J € CP(A,B) is trace non-increasing, then G CP(B,A) is sub-unital. In 


particular, 3^ maps into 3^,{A). Then, 

A(T(^),T(C))= max I Tr(M5'(^ - 0) I = max I Tr(T'^(M)(^ - C))| (3.27) 

< max |Tr(M(^-C))|=A(^,C)- (3-28) 

Me^.(A) 

where we used the dehnition of the norm in (3.17) twice. □ 

As a special case when we take the map to be a partial trace, this relation yields 

4(Pa,'Ca)< min A(pab,Tab) (3.29) 

PabGab 

where p^g and Tab extensions (e.g. purifications) of Pa and Ta, respectively. 


Can we always hnd two purifications such that (3.29) becomes an equality? To see that this is 
in fact not true, consider the following example. If p is fully mixed on a qubit and T is pure, then, 
A(p,t) = i,but A(v/,t?) > for all maximally entangled states y/ that purify p and product 
states j? that purify T. 


3.3 Fidelity 


The last observation motivates us to look at other measures of distance between states. Uhlmann’s 
fidelity [165] is ubiquitous in quantum information theory and we dehne it here for general states. 


Definition 3.6. For any p,C7 G y(A), we define the fidelity of p and t as 

F(p,t) := (TrlVp^rj) . 


(3.30) 


Next we will discuss a few basic properties of the fidelity, and we will provide further details 
when we discuss the minimal quantum Renyi divergence in Section 4.3. In fact, the analysis in 
Section 4.3 will reveal that (p,T) i—> ■y/F(p, t) is jointly concave and non-decreasing when we 
apply a CPTP map to both states. The latter property thus also holds for the fidelity itself 

Beyond that, Uhlmann’s theorem [165] states that there always exist purifications with the 
same fidelity as their marginals. 
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Theorem 3.1. For any states Pa,'^a G ^{A) and any purification Pab G S^(AB) of Pa with 
dg > dA, there exists a purification Tab G ^(AB) oJta such that F(pa,Ta) = F{pab,'^ab)- 


In particular, combining this with the fact that the fidelity cannot decrease when we take a partial 
trace, we can write 

I 12 

F{pa,Ta)= max F{pab,t:ab)= max (0 ab|i>ab) , (3.31) 

tABe^iAB) ^AB,»ABey(AB) 

where Tab is any extension of Ta- The latter optimization is over all purifications |0 ab) of Pa and 
IzJab) of Ta, respectively, and assumes that dB >dA- 

Uhlmann’s theorem has many immediate consequences. For example, for any linear operator 
L G .if (A), we see that 


F{LpL\T)=F{p,LhL) (3.32) 

by using the latter expression in (3.31). This can be generalized further as follows. 

Lemma 3.3. For p,T G ^ (A) and a pinching 7, we have F{T{p), t) = F{p,T{t)). 

Proof. By symmetry, it is sufficient to show an inequality in one direction. Let Oa = J’(Pa) = 
Y.xP^PaP’^ and with <t| = P’^PaP'^ be a set of orthogonal states. Then, introducing an 

auxiliary Hilbert space A' with dA' = dA, we define the projector IT = Y.X PI® PI'- The state Oa 
entertains a purification in the support of IT, namely we can write 

|(^)aa'=- f7|P)AA'=LlOdA'> where = TrA(c7X4') (3-33) 

a: 

are again mutually orthogonal. Hence, by Uhlmann’s theorem 

F{’P{Pa), Ta) = maxTr(aAA'TAA') = maxTr(pAA'nTAA'n) (3.34) 

<maxF{pA,Ti-A'{nTAA'n)), (3.35) 

’’■AA' 

where the maximization is over purifications of Ta. Finally, by choosing a basis {|z)a'}z that 
commutes with all projectors Pf,, we find that 


Tl'A'inTAA'n) = Y, {z\a'Pa®Pa''^aa'Pa®Pa' It)A' (3-36) 

x,y,z 

= {z\a' Taa' \z)a'P1 = I.PI^aPI = ^(ta) , (3.37) 

X.z X 


which concludes the proof. □ 

Finally, we find that the fidelity is concave in each of its arguments. 

Lemma 3.4. The functionals p F[p,T) and T va- F{p ,t) are concave. 
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Proof. By symmetry it suffices to show concavity of p F{p,z). Let p^,p^ S S^o{A) and X G 
(0,1) such that Xp\ + (1 — X)p^ = Pa- Moreover, let Taa' S S^o{AA') be a fixed purification of 
Lt- 

Then, due to Uhlmann’s theorem there exist purifications p^, and p^, of p^ and p^, respec¬ 
tively, such that the following chain of inequalities holds: 

XFiploA) + {1 -X)F{pl,aA) = X\{T:AA'\pL’)f + -^)\{'^aa'\pL')\^ (3.38) 

= {t:aa'\{MpL’){pL’\ + -^)\pL'){pIa'\)\'^aa') ( 3 - 39 ) 

= F{tAAGM pL'KpL' I + (1 - ^) I pIa' )(pi4' I) (3-40) 

<F(TA,Api + (l-A)pi). (3.41) 

The final inequality follows since the fidelity is non-decreasing when we apply a partial trace. □ 


3.3.1 Generalized Fidelity 


Before we commence, we define a very useful generalization of the fidelity to sub-normalized 
density operators, which we call the generalized fidelity. 

Definition 3.7. For p, t S ,5^,(A), we define the generalized fidelity between p and t as 
F*(p,t):= (Tr|Vpv^| + ^/(l-Trp)(l-TrT))^ (3.42) 

Uhlmann’s theorem (Theorem 3.1) adapted to the generalized fidelity states that 

/^(PjT) =maxT;(^,tJ) =maxF;(0,r?), where (3.43) 

(p,l> !> 

v/T;((p,t>) = |((p|r?)|-f v/(l-Tr(p)(l-Trt>), (3.44) 


and (p and range over all purifications of p and T, respectively, and 0 is a fixed purification of 
p. Moreover, using the operators p and f defined in the preceding section, we can write 


F^{p,z) = F^{p,z) = (ti- ) 


(3.45) 


From this representation also follows that the square root of the generalized fidelity is jointly 
concave on SP,{A) x JX',(A), inheriting this property from the fidelity. Moreover, the generalized 
fidelity itself is concave in each of its arguments separately due to Lemma 3.4. 

The extension to sub-normalized states in Definition 3.7 is chosen diligently so that the gen¬ 
eralized fidelity is non-decreasing when we apply a quantum channel, or more generally a trace 
non-increasing CP map. 

Proposition 3.2. Let p,T € =5^,(A), and let E be a trace non-increasing CP map. Then, 
f;(£(p),£(t)) >f;(p,t). 
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Proof. Recall that a trace non-increasing map G CP(A,B) can be decomposed into an isometry 
U G CP {A, BC) followed by a projection FI G F^{BC) and a partial trace over C according to the 
Stinespring dilation representation. 

Let us first restrict our attention to CPTP maps £ where 11 = I. We write = E [p^i] and 
= £[L 4 ]- From the representation of the fidelity in (3.44) we can immediately deduce that 

Fh.(pa,Ta)= max F^.{(Pad,-^ad) = max F4U((Pad), U(Bad)) (3.46) 

<Pad,'^ad <Pad,^ad 

< max = F^{p'g,Zg). (3.47) 

‘CbCD'^BCD 

The maximizations above are restricted to purifications of Pa and Ta, respectively. The sole in¬ 
equality follows since U(^ad) U(Bad) am particular purifications of p^ and Tg in S^,(BCD). 
Next, consider a projection TI G E^{BC) and the CPTP map 


Applying the inequality for CPTP maps to £, we find 


Vf;(p,t) < I y/npnVmn ^ + ^J^li^^p)^I{^^T) < ^F^{npn,nzn), (3.49) 


where we used that Trp < 1 and Trr < 1 in the last step. 


□ 


The main strength of the generalized fidelity compared to the trace distance lies in the follow¬ 
ing property, which tells us that the inequality in Proposition 3.2 is tight if the map is a partial 
trace. Given two marginal states and an extension of one of these states, we can always find an 
extension of the other state such that the generalized fidelity is preserved by the partial trace. This 
is a simple corollary of Uhlmann’s theorem. 


Corollary 3.1. Let Pab G S^t{AB) and Ta G .5^,(A). Then, there exists an extension Tab such 
that Ft {Pab 7 T4B ) = F* (Pa , Ta ). Moreover, if Pab is pure and dB>dA, then Tab can be chosen 
pure as well. 


Proof. Clearly Ft{pA,'^A) > Ft{pAB^'^AB) by Proposition 3.2 for any choice of Tab- Let us first 
treat the case where Pab is pure. Using Uhlmann’s theorem in (3.44), we can write 

F;(Pa,Ta) =max£;(0AB,'i?AB), where ^ab=Pab- (3.50) 

^AB 

We then take Tab to be any maximizer. For the general case, consider a purification Pabc of Pab- 
Then, by the above argument there exists a state Tabc with F* {Pabc, t^abc) = Ft (pa, Ta). Moreover, 
by Proposition 3.2, we have Ft{pABC,'^ABc) < Fi,(pab,Tab) < Fi,(pa,Ta). Hence, all inequalities 
must be equalities, which concludes the proof. □ 
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3.4 Purified Distance 

The fidelity is not a metric itself, but for example the angular distance [125] and the Bures met¬ 
ric [31] are metrics. They are respectively defined as 


A{p,x) :=arccos y/F{p,T) and B(p,T) := - s/F{p,x)^ . (3.51) 

We will now discuss another metric, which we find particularly convenient since it is related to 
the minimal trace distance of purifications [67, 134, 156]. 

Definition 3.8. For p,x € <y,(A), we define the purified distance between p and x as 
F(P,t) := ^/l-F^{p,x). 


Then, for quantum states p,X G ^o{A), using Uhlmann’s theorem we find 


P(p,x) = ^/l-F^:{p,x) = l-max\{(p\B)\^ =mmA{(p,-&). (3.52) 

V <p,tr 

Here, |^) and |t?) are purifications of p and x, respectively. 

As it is defined in terms of the generalized fidelity, the purified distance inherits many of its 
properties. For example, for trace non-increasing CP maps T, we find 

P(T(p),T(t))<P(p,t). (3.53) 

Moreover, the purified distance is a metric on the set of sub-normalized states. 

Proposition 3.3. The purified distance is a metric on J^t(A). 


Proof. Let p,x,a G J^,{A). The condition P(p,T) = 0 if and only if p = T can be verified by 
inspection, and symmetry P{p,x) = P{x,p) follows from the symmetry of the fidelity. 

It remains to show the triangle inequality, P{p < P{p,a) -\-P{cT,x). Using (3.45), the gen¬ 
eralized fidelities between p, x and o can be expressed as fidelities between the corresponding 
extensions p, x and a. We employ the triangle inequality of the angular distance, which can be 
expressed in terms of the purified distance as A(p,f) = arccos ■\/F*(p, O’) = arcsinP(T, t). We 


find 

P(p,T) = sinA(p,f) (3.54) 

< sin(A(p,d)-l-A(d,f)) (3.55) 

= sinA(p,d)cosA(d,f) -l-sinA(d,f)cosA(p,d) (3.56) 

= P(P,o)a/f;(o,t)-|-P(o,t)v/f;(p,o) (3.57) 

<P(p,o)-fP(o,T), (3.58) 

where we employed the trigonometric addition formula to arrive at (3.56). □ 
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Note that the purified distance is not an intrinsic metric. Given two states p, T with P{p ,T)<e 
it is in general not possible to find intermediate states CJ^ with P{p,a^) — Ae and P{a^, t) = 
(1 — X)e. In this sense, the above triangle inequality is not tight. It is thus sometimes useful 
to employ the upper bound in (3.57) instead. For example, we find that P{p,a) < sin((p) and 
< sin(ri) implies 


P{P,^) < sin(^ + ri) < sin((/)) + sin(0) 


(3.59) 


if (p,ri > 0 and ^ + < |. 

The purified distance is jointly quasi-convex since it is an anti-monotone function of the square 
root of the generalized fidelity, which is jointly concave. Formally, for any pi,p 2 ,Ti,T 2 G 
and A G [0,1], we have 

P(Api + (1-A)p2,Ati + (1-A)t 2) < max P(p,-,T,). (3.60) 

ie{i,2} 

The purified distance has simple upper and lower bounds in terms of the generalized trace 
distance. This results from a simple reformulation of the Fuchs-van de Graaf inequalities [59] 
between the trace distance and the fidelity. 

Lemma 3.5. Let p, T G Then, the following inequalities hold: 

A{p,x)<P{p,x) < yj2A{p,T)-A{p,Tf < ^/2A{p,x). (3.61) 

Proof. We first express the quantities using the normalized density operators p and f, i.e. 
P(p, t) = P{p,t) and Zi(p, t) = A{p,t). Then, the result follows from the inequalities 

l-^Fip,x)<D{p,f)<^l-Fip,f) (3.62) 

between the trace distance and fidelity, which were first shown by Fuchs and van de Graaf [59]. 

□ 


3.5 Background and Further Reading 

We defer to Bhatia’s book [26, Ch. IV] for a comprehensive introduction to matrix norms. Fuchs’ 
thesis [58] gives a useful overview over distance measures in quantum information. The fi¬ 
delity was first investigated by Uhlmann [165] and popularized in quantum information theory 
by Jozsa [97] who also gave it its name. Some recent literature (most prominently Nielsen and 
Chuang’s standard textbook [125]) defines the fidelity as \/Ff, •), also called the square root 
fidelity. Here we adopted the historical definition. 

The discussion on generalized fidelity and purified distance is based on [152] and [156]. The 
purified distance was independently proposed by Gilchrist et al. [67] and Rastegin [134, 135], 
where it is sometimes called ‘sine distance’. However, in these papers the discussion is restricted 
to normalized states. The name ‘purified distance’ was coined in [156], where the generalization 
to sub-normalized states was first investigated. 








Chapter 4 

Quantum Renyi Divergence 


Shannon entropy as well as conditional entropy and mutual information can be compactly ex¬ 
pressed in terms of the relative entropy, or Kullback-Leibler divergence. In this sense, the diver¬ 
gence can be seen as a parent quantity to entropy, conditional entropy and mutual information, 
and many properties of the latter quantities can be derived from properties of the divergence. 
Similarly, we will define Renyi entropy, conditional entropy and mutual information in terms of 
a parent quantity, the Renyi divergence. We will see in the following chapters that this approach 
is very natural and leads to operationally significant measures that have powerful mathematical 
properties. This observation allows us to first focus our attention on quantum generalizations of 
the Kullback-Leibler and Renyi divergence and explore their properties, which is the topic of this 
chapter. 

There exist various quantum generalizations of the classical Renyi divergence due to the non- 
commutative nature of quantum physics.* Thus, it is prudent to restrict our attention to quantum 
generalizations that attain operational significance in quantum information theory. A natural ap¬ 
plication of classical Renyi divergence is in hypothesis testing, where error and strong converse 
exponents are naturally expressed in terms of the Renyi divergence. In this chapter we focus on 
two variants of the quantum Renyi divergence that both attain operational significance in quan¬ 
tum hypothesis testing. Here we explore their mathematical properties, whereas their application 
to hypothesis testing will be reviewed in Chapter 7. 


4.1 Classical Renyi Divergence 

Before we tackle quantum Renyi divergences, 
sical Renyi divergence they are supposed to 
quantum language, and we will later see that 
Renyi divergences. 


let us first recapitulate some properties of the clas- 
generalize. We formulate these properties in the 
most of them are also satisfied by some quantum 


* In fact, uncountably infinite quantum generalizations with interesting mathematical properties can easily be 
constructed (see, e.g. [9]). 


43 



44 


4 Quantum Renyi Divergence 


4.1.1 An Axiomatic Approach 

Alfred Renyi, in his seminal 1961 paper [142] investigated an axiomatic approach to derive the 
Shannon entropy [144]. He found that five natural requirements for functionals on a probability 
space single out the Shannon entropy, and by relaxing one of these requirements, he found a 
family of entropies now named after him. 

The requirements can be readily translated to the quantum language. Here we consider general 
functionals D(-|| •) that map a pair of operators p, <7 S =^(A) with p ^ 0, (7 ^ p onto the real line. 
Renyi’s six axioms naturally translate as follows; 

(I) Continuity: D(p ||( 7 ) is continuous in p, a S ^(A), wherever p 7 ^ 0 and (7 ^ p. 

(II) Unitary invariance: D(p ||a) = D(t/pt/’^||t/( 7 t/^) for any unitary U. 

(III) Normalization: D(11| 5 ) = log(2). 

(IV) Order: If p > a, then D(p||( 7 ) > 0. And, if p < a, then D(p||( 7 ) < 0. 

(V) Additivity: D(p (g) T||a(g) at) = D(p||( 7 )+D(T|ja)) for all p,a G ^(A), t,® G with 

p 7 ^ 0 , T 7 ^ 0 . 

(VI) General mean: There exists a continuous and strictly monotonic function g such that Q(-1| •) := 
g(D(-|j-)) satisfies the following. For p ,(7 G o5^(A), T,® G ^{B), 

Q{p®T:\\(y®(0) = ^^^^■Qip\\(7) + ^^^-q{z\\0)). (4.1) 

Renyi [142] first shows that (I)-(V) imply ]D)(A|jp) = logA —logp for two scalars A,p > 0, a 
quantity that is often referred to as the log-likelihood ratio. In fact, the axioms imply the following 
constraint, which will be useful later since it allows us to restrict our attention to normalized states. 

(III+) Normalization: I}{ap\\ba) = D(p||( 7 ) +loga —logp for > 0. 

We also remark that invariance under unitaries (II) is implied by a slightly stronger property, 
invariance under isometries. 

(11+) Isometric Invariance: D(p||ff) = D(ypy'*'||yffy’*') for p ,(7 G o5^(A) and any isometry V 
from A to B. 

Renyi then considers general continuous and strictly monotonic functions to define a mean 
in (VI), such that the resulting quantity is still compatible with (I)-(V). Under the assumption 
that the states px and Ox are classical, he then establishes that Properties (I)-(VI) are satisfied 
only by the Kullback-Leibler divergence [103] and the Renyi divergence for a G (0,1 )U(1 

7 '^)7 

which are respectively given as 


II,, ^ IvPW logpW-logaW 
D{px\ (yx) - ■' - ■ ^ - 

with g : f I-+ f, 

(4.2) 

Z)a(Px||(7x)=^_^log 

with ga : f I-+ exp ((a —l)f). 

(4.3) 


These quantities are well-defined if px and ax have full support and otherwise we use the conven¬ 
tion that OlogO = 0 and g = 1, which ensures that the divergences are indeed continuous whenever 
Px 7^0 and Gx Px- Finally, note that both quantities diverge to -l-oo if the latter condition is not 
satisfied and a > 1 . 
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Unlike in the classical case, the above axioms do not uniquely determine a quantum generaliza¬ 
tion of these divergences. Hence, we first list some additional properties we would like a quantum 
generalization of the Renyi divergence to have. These are operationally significant, but mathe¬ 
matically more involved than the axioms used by Renyi. The classical Renyi divergences satisfy 
all these properties. 

The two most significant properties from an operational point of view are positive definiteness 
and the data-processing inequality. First, positive definiteness ensures that the divergence is posi¬ 
tive for normalized states and vanishes only if both arguments are equal. This allows us to use the 
divergence as a measure of distinguishability in place of a metric in some cases, even though it is 
not symmetric and does not satisfy a triangle inequality. 

(VII) Positive definiteness: If p, cj G o5^o(A), then D(p||(j) > 0 with equality iff p — a. 

The data-processing inequality (DPI) ensures the divergence never increases when we apply a 
quantum channel to both states. This strengthens the interpretation of the divergence as a measure 
of distinguishability — the outputs of a channel are at least as hard to distinguish as the inputs. 

(VIII) Data-processing inequality: For any £ G CPTP(A,B) and p, cr G ^{A), we have 

D(p||(j)>D(£(p)||£((j)). (4.4) 


Finally, the following mathematical properties will prove extremely useful. (Note that we ex¬ 
pect that either (IXa) or (IXb) holds, but not both.) 

(IXa) Joint convexity (applies only to Renyi divergence with a > 1): For sets of normalized states 
{p,}/, {(7,}, C S^o{A) and a probability mass function {A,}, such that A,- > 0 and A,- = 1, we 
have 


^A;Q(p,-|1(7,') 




Consequently, (p, O’) i-G D(p||o) is jointly quasi-convex, namely 


(4.5) 


D 



^A,o,-) <maxD(p,||o,). 


(4.6) 


(IXb) Joint concavity (applies only to Renyi divergence with a < 1): The inequality (4.5) holds in 
the opposite direction, i.e. (p, o) i—)■ Q(p||o) is jointly concave. Moreover, (p, o) i-G- D(p ||o) 
is jointly convex. 

These properties are interrelated. For example, we clearly have D(p||o) > 0 in (VII) if data- 
processing holds, since D(p||o) > D(Tr(p)|| Tr(o)) = D(l||l) = 0. Furthermore, D(p||p) = 0 
follows from (IV). To establish positive definiteness (VII) it in fact suffices to show 

(VII-) Definiteness: For p,a G yo, we have D(p||(j) = 0 p = a. 

when (IV) and (VIII) hold. The most important connection is drawn in Proposition 4.2 in Sec¬ 
tion 4.2, and establishes that data-processing holds if and only if joint convexity resp. concavity 
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holds (depending on the value of a) for all quantum Renyi divergences. The last property gener¬ 
alizes the order property (IV) as follows. 

(X) Dominance: For states p,a,a' G ^(A) with a < a', we have D(p||(7) > D(p||c7'). 

Clearly, dominance (X) and positive definiteness (VII) imply order (IV). 


In the following we will show that these properties hold for the classical Renyi divergence, 
i.e. for the case when the states p and a commute. As we have argued above (and will show in 
Proposition 4.2), to establish data-processing, it suffices to prove that the KL divergence in (4.2) 
and the classical Renyi divergences (4.3) satisfy joint convexity resp. concavity as in (IXa) and 
(IXb). For this purpose we will need the following elementary lemma: 


Lemma 4.1. If f is convex on positive reals, then F : {p,q) i— qf{^) A jointly convex. Moreover, 
if f is strictly convex, then F is strictly convex in p and in q. 


Proof Let {A,},-, {p,},-, {^i}; be positive reals such that Y.i^iPi = P Then, em¬ 

ploying Jensen’s inequality, we find 




L ^iqi r 

—f 

i ‘I 


>?/ L 


kqi Pi 
q qi 


= qf 


(4.7) 


The second statement is evident if we fix either pi = p or q, = q. 


□ 


This lemma is a generalization of the famous log sum inequality, which we recover using the 
convex function f : t i-G f logf. 

Let us then recall that for normalized px.Ox G o5^o(X), we have 

Qa(pxllc7x):=ga(Oa(pxllc7x)) =E^(^) ■ (^.8) 

First, note that Qa has the form of a Csiszar-Morimoto /-divergence [39,117], where /« : f i—> f “ is 
concave for a G (0,1) and convex for a > 1. Joint convexity resp. concavity of Qa is then a direct 
consequence of Lemma 4.1, which we apply for each summand of the sum over x individually. 
By the same argument applied for f : 1 f logf (i.e. the log sum inequality), we also find that 

£>(px||o-x) (4.9) 


is jointly convex. 

The Renyi divergences satisfy the data-processing inequality (VIII), i.e. Da is contractive un¬ 
der application of classical channels to both arguments. This can be shown directly, but since we 
have established joint convexity resp. concavity, it also follows from (a classical adaptation of) 
Proposition 4.2 below and we thus omit the proof here. 

Dominance (X) is evident from the definition. It remains to show definiteness (VII-) and thus 
(VII). This is a consequence of the fact that Q and Qa are strictly convex resp. concave in the 
second argument due to Lemma 4.1. Namely, let us assume for the sake of contradiction that 
^(PxIIPx) =D(px||ax) = 0. Then we get thnt D{px\\^px + ^<yx) <0if px f CTx, which contra¬ 
dicts positivity. A similar argument applies to Qa, and we are done. 
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The Kullback-Leibler divergence and the classical Renyi divergence as debned in (4.2) 
and (4.3) satisfy Properties (I)-(X). 


4.1.3 Monotonicity in a and Limits 

Due to the parametrization in terms of the parameter a, we also find the following relation be¬ 
tween different Renyi divergences. 

Proposition 4.1. The function (0,1) U (1,°°) 9 a log 2a (Px 11 Ox) is convex for all Px,Ox S 
with px and (7x 3> Px- Moreover, it is strictly convex unless px = aGxfor some a > 0. 

Proof It is sufficient to show this property for Px, Oy S o5^o(X) due to (Ill-t). We simply evaluate 
the second derivative of this function, which is 

I-// 2 a(PA'||Ox) 2 a(Px||Ox)- 2 a(Px||Ox)^ 

t = -- t4.1Uf 

Qa{Px\\OxY 

where 

2 a(P 7 fl!ox) =l^p( 7 c)“o( 7 i:)'^“(lnp(x)-lnff(x)), and (4.11) 

2a(Px||Ox) = ^p(x)“(7(x)'^“(lnp(x)-ln(7(x))^ (4.12) 

Note that P(x) = p(x)“(7(x)*^“/2a(Px||Ox) is a probability mass function. Using this, the above 
expression can be simplihed to 

U" = ^P(x)(lnp(x) — ln(j(x))^ — ^^P(x)(lnp(x) — lna(x))'j . (4.13) 

Hence, F" > 0 by Jensen’s inequality and the strict convexity of the function 1 1-9 f^, with equality 
if and only if p (x) = aaix) for all x. □ 

As a corollary, we hnd that the Renyi divergences are monotone functions of a. 


Corollary 4.1. The function a i—)■ Da(px||<7x) is monotonically increasing. Moreover, it is 
strictly increasing unless px = aOx for some a >0. 


Proof We set 2a = 2a(px||c5x) to simplify notation and note that log2i = 0. Let us assume 
that a >li>l and set A = S (0,1). Then, by convexity of a -9 log2a. we have 


log2j3 =log2Aa+(l-A) < Alog2a + (l-A)log2l = 


( 3-1 

a- 1 


log 2a- 


(4.14) 
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This establishes that Da{px\\<^x) > Dp{px\\cTx), as desired. The inequality is strict unless px = 
Gx, as we have seen in Proposition 4.1. 

For 1 > a > j3, an analogous argument with X = establishes that logQa < \^logQp, 
which again yields Da {px 1 1 Ox) > Dp (px 1 1 Ox) taking into account the sign of the prefactor. □ 

Since we have now established that Da is continuous in a for a G (0,1) U it will be 

interesting to take a look at the limits as a approaches 0, 1 and oo. First, a direct application of 
FHopital’s rule yields 


lim D 

a\l 


a(Px||Ox) = lim I>a(Px||Ox) = D(px|jOx) . 


a/'l 


(4.15) 


So in fact the KL divergence is a limiting case of the Renyi divergences and we consequently 
define Di (px||Ox) := Z)(px||Ox). In the limit a —?> oo, we find 


£>oo(px||ox) 


lim Z)a(px||Ox) 

a—koo 


= max log 

X 


pW 

o(x)’ 


(4.16) 


which is the maximum log-likelihood ratio. We call this the max-divergence, and note that it 
satisfies all the properties except the general mean property (VI). However, the max-divergence 
instead satisfies 


D(p ©T||(7©a)) = max{D(p||(7), D(T|ja))} . (4.17) 

The limit a —^ 0 is less interesting because it leads to the expression 

T>o(px||ox) := limZ)a(px||cJx) =-log T g{x), (4.18) 

x:pU>0 

which is discontinuous in px and thus does not satisfy (I). Hence, we hereafter consider Da with 
a > 0 as a single continuous one-parameter family of divergences. 

Monotonicity of Da is not the only byproduct of the convexity of logQa- For example, we 
also find that 


^^i+a(pI|c^) + (1-A)£»oc(p||(t) >D2(p||(7). 
for X G [0,1] and various similar relations. 


(4.19) 


4.2 Classifying Quantum Renyi Divergences 

Clearly, we expect suitable quantum Renyi divergences to have the properties discussed in the 
previous section. 


Definition 4.1. A qnantum Renyi divergence is a quantity D(-||-) that satisfies Proper¬ 
ties (I)-(X) in Sections 4.1.1. (It either satisfies IXa or IXb.) 
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A family of quantum Renyi divergences is a one-parameter family a !->■ Da(-||-) of 
quantum Renyi divergences such that Corollary 4.1 in Section 4.1.3 holds on some open 
interval containing 1. 

Before we discuss two specific families of Renyi divergences in Sections 4.3 and 4.4, let us 
first make a few observations that apply more generally to all quantum Renyi divergences. 


4.2.1 Joint Concavity and Data-Processing 

First, the following observation relates joint convexity resp. concavity and data-processing for all 
quantum Renyi divergences. It establishes that for functionals satisfying (I)-(VI), these properties 
are equivalent. 

Proposition 4.2. Let D be afunctional satisfying (I)-(VI) and let g and Q be defined as in 
(VI). Then, the following two statements are equivalent. 

(1) Q is jointly convex (IXa) ifg is monotonically increasing, or jointly concave (IXb) ifg is 
monotonically decreasing. 
f2j D satisfies the data-processing inequality (VIII). 


Proof. First, we show (1) (2). Note that the axioms enforce that Q is invariant under isome¬ 

tries and consulting the Stinespring dilation, it thus remains to show that the data-processing 
inequality is satisfied for the partial trace operation. For the case where Q is jointly convex, we 
thus need to show that Q{Pab \\ <^ab ) > Q(Pa I! Oa) for Pab , <^ab € (AB) and A and B are arbitrary 
quantum systems. 

To show this, consider a unitary basis of Jf{B), for example the generalized Pauli operators 
{XgZgjiff,, where l,m G [c/b]. These act on the computational basis as 


XB\k) = \k-\-\ mod c/b) and Zslk) = e ‘‘b \k) . 


(4.20) 


(If we only consider classical distributions, we can set Zb = /b ) Then, after collecting these 
operators in a set {Ui = XgZ’jjji with a single index i = {l,m), a short calculation reveals that 



■B 


(4.21) 


for any ^ab S 3"{AB). Consequently, unitary invariance and joint convexity yield 



(4.22) 


(4.23) 
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Finally, Q(Pa ® ® ^b) = Q(PA||oii) by Properties (IV) and (V). Analogously, joint con¬ 

cavity of Q implies data-processing for —Q, and thus D. 

Next, we show that (2) => (1). Consider p,a,X,C0 G and X G (0,1). Then, the data- 


processing inequality implies that 

D(lp -f(l-l)T||A(7-f(l-A)a)) < D(Ap 0 (1 - A)t||Ac7© (1 - A)®). (4.24) 

If g is monotonically increasing, we hnd that 

g(D(Ap 0 (1 - A)t||A(7 0 (1 - A)®)) (4.25) 

< g(D(Ap 0 ( 1 - A)t||A(7 0 (1 - A)®)) (4.26) 

= Ag(D(Ap||A(j)) 0 (1 - A)g(D((l - A)t|1(1 - A)®)) (4.27) 

= Ag(D(p||(j)) 0 (1 - A)g(D(T||®)), (4.28) 


where we used property (VI) for the hrst equality and (V) and (IV) for the last. It follows that 
Q(- II •) is jointly convex. An analogous argument yields joint concavity if g is decreasing. 


4.2.2 Minimal Quantum Renyi Divergence 

Let us assume a quantum Renyi divergence satishes additivity (V) and the data-processing 
inequality (VIII). Then, for any pair of states p and a and their n-fold products, p®" and a®”, we 
have 


D„(p||(j) = -D„(p®"||cT®”) > -D„(T^«„(p®”)||(J®”), (4.29) 

n n 11 / 

where Tai') is the pinching channel discussed in Section 2.6.3 and the quantity on the right-hand 
side is evaluated for two commuting and hence classical states. 

So, in particular, a quantum Renyi divergence with property (V) and (VIII) that generalizes 
Da must satisfy 


D„(p||a) > hm -D„(T^®„(p®")||(j®") (4.30) 

1 / . l-a l-a, 

= — —j-logTr(^((jT^p(jTS“) j. (4.31) 

The proof of the last equality is non-trivial and will be the topic of Section 4.3.1. 

Conversely, this inequality is a necessary but not a sufficient condition for additivity and data- 
processing. Potentially tighter lower bounds are possible, for example by maximizing over all 
possible measurement maps on n systems on the right-hand side. However, we will see in the next 
section that the minimal quantum Renyi divergence (also known as sandwiched Renyi divergence), 
dehned as the expression in (4.31), has all the desired properties of a quantum Renyi divergence 
for a large range of a. 
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A general upper bound can be found by considering a preparation map, using Matsumoto’s elegant 
construction [113]. For two fixed states p and ff, consider the operator A = with 

spectral decomposition 

A=^Xxnx, as well as q{x) =Tr {a Tlx), p{x)=Xxq{x). (4.32) 

Then, the CPTP map A (•) = (.x| • |x) -^y/allxy/c satisfies 

^{p)='E^'^^x'/(y = P, A{q)=Y,^^'/^nxV^^a. (4.33) 

Hence, any quantum generalization of the Renyi divergence Da with data-processing (VIII) must 
satisfy 


Da(p||c^) <£)a(pk) = ^^^logTr(^(jz((7 ipa z)“(tz). (4.34) 

We call the quantity on the right-hand side of (4.34) the maximal quantum Renyi divergence. 
For a C (0,1), the term in the trace evaluates to a mean [102]. Specifically, for CC — ^ the right- 
hand side of (4.34) evaluates to —21ogTr(p#(j), where ‘#’ denotes the geometric mean. These 
means are jointly concave and thus we also satisfy a data-processing inequality. Furthermore, 
D 2 {p\\q) = logTr(p^(7^*) is an upper bound on D 2 (p||(j). and in the limit a —)■ 1 we find that 

Di(p||o’) <Tr|^(7Zp(7^zlog((7^zp(7^z)^ =Tr(^plog(pZ(7^*p^)) • (4.35) 

The last equality follows from (2.45) and the expression on the right is the Belavkin-Staszewski 
relative entropy [17]. In spite of its appealing form, the maximal quantum Renyi divergence has 
not found many applications yet, and we will not consider it further in this text. 

The minimal and maximal Renyi divergences are compared in Figure 4.1 . 


4.2.4 Quantum Max-Divergence 

The bounds in the previous subsection are not sufficient to single out a unique quantum gener¬ 
alization of the Renyi divergence for general a (and neither are the other desirable properties 
discussed above), except in the limit a —> oo, where the lower bound in (4.31) and upper bound 
in (4.34) converge. Hence, the max-divergence has a unique quantum generalization. 

Let us verify this now. First note that for a —> oo Eq. (4.34) yields 

D<x,(p||(7) < Doo{p\\q) = maxlogAt = log ||A ||<x, = inf{A : p < exp(A)(7}. (4.36) 

X 

So let us thus define the quantum max-divergence as follows [41,139]: 
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The minimal, Petz, and maximal quantum Renyi divergences are given by the relation D„(p|j(T) = log 2a 
with the respective functionals 


2a =Tr(^(^(T'2a“p(T^^ 2a=Tr(p“(7' “), and 2a = Tr (^(T Jpff 



Fig. 4.1 Minimal, Petz and maximal quantum Renyi entropy (for small a). These divergences are discussed in 
Section 4.3, Section 4.4, and Section 4.2.3, respectively. Solid lines are used to indicate that the quantity satisfies 
the data-processing inequality in this range of a. 


Definition 4.2. For any p, <7 S 3^(A), we define the qnantnm max-divergence as 

Doo{p\\(7) := inf{A : p < exp(A)(7}, (4.37) 

where we follow the usual convention that inf 0 = oo. 

Using the pinching inequality (2.61), we find that 

p < exp(A)(j J’a(p) < exp(A)a, (4.38) 

J’o(p) < exp(A)(j => p < |spec((j)|exp(A)a, (4.39) 

and, thus, the quantum max-divergence satisfies 

Z)oc(Tc7(p)||c7) <D,o(p||(t) <Z)„o(Tcj(p)||(7)+log|spec((7)|. (4.40) 

We now apply this to n-fold product states p®” and cr®" and use the fact that |spec((7®")| < 
(n -f grows at most polynomially in n, such that 

0 < lim - log I spec((7) I < lim — -log(n -f 1) = 0. (4.41) 

n^oo fi ' ' n^oo fi 


The term thus vanishes asymptotically as n —oo, which means that 
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1 


and 


1 




(4.42) 


n 


n 


are asymptotically equivalent. Further using that Doo is additive, we establish that 



This argument is in fact a special case of the discussion that we will follow in Section 4.3.1 for 
general Renyi divergences. 

Hence, Eq. (4.31) yields that Doo(p||o’) > Z)oo(p||o’) for any quantum generalization of the 
max-divergence satisfying data-processing and additivity. We summarize these hndings as fol¬ 
lows: 

Proposition 4.3. Doo is the unique quantum generalization of the max-divergence that satisfies 
additivity (V) and data-processing (VIII). 

We leave it as an exercise for the reader to verify that that the quantum max-divergence also 
satishes Properties (I)-(X). 


4.3 Minimal Quantum Renyi Divergence 

In this section we further discuss the minimal quantum Renyi divergence mentioned in Sec¬ 
tion 4.2.2. In particular, we will see that the following closed formula for the minimal quantum 
Renyi divergence corresponds to the limit in (4.30) for all a. 

Definition 4.3. Let a G (0,1) U (1,°°), and p,(7 € with p fO. Then we define the 

minimal quantum Renyi divergence of a with p as 



^ if(a<lAp/(7)Vp<C(7 


Daip\\(y): 


else 


Moreover, Do, Di and Doc are defined as limits of Da for a —)• {0,1,°°}. 

In Section4.3.2 we will see that D„(p||c7) =Doo(p||<7) (cf Definition 4.2). 

The minimal quantum Renyi divergence is also called ‘quantum Renyi divergence’ [122] and 
‘sandwiched quantum Renyi relative entropy’ [175] in the literature, but we propose here to call 
it minimal quantum Renyi divergence since it is the smallest quantum Renyi divergence that still 
satisfies the crucial data-processing inequality as seen in (4.31). Thus, it is the minimal quantum 
Renyi divergence for which we can expect operational significance. 

By inspection, it is evident that this quantity satisfies isometric invariance (Il-t), normalization 
(III-h), additivity (V), and general mean (VI). Continuity (I) also holds, but one has to be a bit 
more careful since we are employing the generalized inverse in the definition. (See [122] for a 
proof of continuity when the rank of p or a changes.) 
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4.3.1 Pinching Inequalities 

The goal of this section is to establish that Da is contractive under pinching maps and can be 
asymptotically achieved by the respective pinched quantity. For this purpose, let us investigate 
some properties of 


Qa{p\\(y) 


1-0! 1-0! 

(J 2a pa 


^ / / 1 - 0 ! 1 - 0 ! \ ^\ 

= Tr M ff 2 “ p a 2 “ j j 

“=Tr((pia^p^)“). 


(4.45) 

(4.46) 


for p, a G S^oiA) with p <C <7. First, we find that it is monotone under the pinching channel [122]. 


Lemma 4.2. For a > 1, we have 

e«(p||c7)>e„(Ta(p)||cj) (4.47) 

and the opposite inequality holds for a G (0,1). 

1-0! 1-0! / 1-0! 1-0! \ 

Proof. We have a^^7a{p)<y"^ = pcJ^^j since the pinching projectors commute 

with a. For a > 1, we find 


Qa{ya{p)\\(y) 


/ 1-0! 1-0! 
!P(t(o’ 2 “ pa 20 ! 


< 


t l-a 

pa 20 ! 


= Qaip\\a), 


(4.48) 


where the inequality follows from the pinching inequality for norms (3.4). For a < 1, the operator 
Jensen inequality (2.64) establishes that (CPct(<7T^ pa ^^))” — pa ^^)”). Thus, 


e„(ya(p)||c7)>Tr(T^(((j^p(j^)“)) =e„(p||(j). (4.49) 


The following general purpose inequalities will turn out to be very useful; 

Lemma 4.3. For any p <p', we have Qa{p\\a) < Qa{p'Wa). Furthermore, ifa< a' and a> 1, 
we have 


Qa{p\\a)>Qa{p\\a') (4.50) 

and the opposite inequality holds for a G [ 5 , !)• 

Proof Setc= Ifp < p', then (J 2 pa’i < a’tp'a'i and the first statement follows from the 
monotonicity of the trace of monotone functions (2.46). 

To prove the second statement for a G [ j, 1), we note that f i-G is operator monotone. Hence, 

i 1^ 1 1 1 

p2(7 a p2 <p2 a a p2 (4.51) 

and the statement again follows by (2.46). Analogously, for a > 1 we find that t i-G is operator 
monotone and the inequality goes in the opposite direction. □ 

In particular, the second statement establishes the dominance property (X). On the other hand, 
we can employ the first inequality to get a very general pinching inequality. For any CP maps £ 
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and 3^, and any a > 0 , we have 

e„(£(p)||:?(cT))<|spec((j)|“e„(£(T^(p))||j((j)). (4.52) 

A more delicate analysis is possible for the pinching case when a G (0,2]. We establish the 
following stronger bounds [80]: 

Lemma 4.4. For a £ [1,2], we have 

Qa{p\\(y) < |spec(ff)l“^^ Qa{‘ya{p)\\(y) (4.53) 

and the opposite inequality holds for a £ ( 0 , 1 ]. 

Proof. By the pinching inequality, we have p < \ spec(a)j iP(j(p). Then, we write 

2a(p||o’) = Trf (ff 2 « pa 2 « ) a 2 a pa 2 a \ (4.54) 

Then, for a £ (1,2], we use the fact that 1 1 -£ is operator monotone, such that the pinching 
inequality yields the following bound: 

2a(p||c^) < lspec(a)l“^^Tr(^(a^TCT(p)(^^)“^'(T^pcT^) ■ (4.55) 


Now, note that the pinching projectors commute with all operators except for the single p in the 
term that we pulled out initially, and hence we can pinch this operator “for free”. This yields 


Tr 


\-a , , l-a V a— 1 l-o: 

aT^T(j(p)o’^) a^ 



Qa{ya{p)\\CT) 


(4.56) 


and we have established Eq. (4.53). Similarly, we proceed for a £ (0,1), where the pinching 

1-0! 1-0! . / \ I 1-0! rr> / \ 1-0! 

inequality again yields a 2 a pcj 2 a < | spec((7)|(7 20 ! 20 ! , and thus we have 

/ 1-0! l-o:vr/_i , , 1 / 1-0! _ , , l-0!vn'—1 

(aT^pa^s^) > lspec(a)l (a^TCT(p)o’^) (4.57) 

l-o: 1 — 0 ! 

on the support of a 2 a pa 2 a . Combining this with the development leading to (4.53) yields the 
desired bound. □ 


A combination of the above Lemmas yields an alternative characterization of the minimal 
quantum Renyi divergence in terms of an asymptotic limit of classical Renyi divergences, as 
desired. 


Proposition 4.4. For p, a £ SP{A) with p 7 ^ 0, p <C a, and a > 0, we have 
5„(pl|a) = lim -Da(T(j®„(p®")||a®"). 

n^oo n 11 / 


Proof It suffices to show the statement for p, a £ =5^o(A). Summarizing Lemmas 4.2-4.4 yields 



56 


4 Quantum Renyi Divergence 


Da{ya{p)\\(y) >Da{p\\(y) 

>Da{yaip)\\(y) 


log|spec(ff)| for a e (0,1) U (1,2] 
log I spec (ff) I fora >2 


(4.58) 

(4.59) 


Since < 2 for a > 2, we can replace the correction term on the right-hand side by 
21 og|spec((7)|, which has the nice feature that it is independent of a. Hence, for n-fold prod¬ 
uct states, we have 




< - log specfcj' 
n ' 


.<S)n\ 


(4.60) 


The result then follows by employing (4.41) in the limit n —?> oo. 

Finally, we note that the convergence is uniform in a (as well as p and a), and thus the equality 
also holds for the limiting cases Dq, Di and Doc- □ 

The strength of this result lies in the fact that we immediately inherit some properties of the 
classical Renyi divergence. More precisely, a \ogQa{p\\<y) is the point-wise limit of a se¬ 
quence of convex functions, and thus also convex. 


Corollary 4.2. The function a i—> log2a(p||(7) is convex, and a i—^ Da{p\\<j) is monoton- 
ically increasing. 


4.3.2 Limits and Special Cases 

Instead of evaluating the limits for a —oo explicitly as in [122], we can take advantage of the fact 
that Proposition 4.4 already gives an alternative characterization of the limiting quantity in terms 
of the pinched divergence. Hence, as Eq. (4.43) reveals, the limit is the quantum max-divergence 
of Dehnition 4.2 as claimed earlier. 

In the limit a —?> 1, we expect to find the ‘ordinary’ quantum relative entropy or quantum 
divergence, first studied by Umegaki [166]. 


Definition 4.4. For any state p G with p 0 and any <7 G ^{A), we define the 

qnantnm divergence of a with p as 


D{p\\a) 



if p < <7 
else 


(4.61) 


This reduces to the Kullback-Leibler (KL) divergence [103] if p and a are classical (commut¬ 
ing) operators. We now prove that Di (p || O’) = Z)(p || o). 
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Proposition 4.5. For p,(J G ^(A) with p ^0, we find that Di{p\\a) equals 


limD„(p||( 7 ) = limD„(p||( 7 ) =D(p||( 7 ). 

a\l HZ'! 


(4.62) 


The proof proceeds by finding an explicit expression for the limiting divergence [122,175]. (Al¬ 
ternatively one could show that the quantum relative entropy is achieved by pinching, as is done 
in [73].) We follow [175] here: 

Proof. Since the proposed limit satisfies the normalization property (Ill-t), it is sufficient to eval¬ 
uate the limit for p,(7 G S^o{A). Furthermore, we restrict our attention to the case p <C <7. By 
FHopital’s rule and the fact that Qi (p || O’) = 1, we have 


limDa(p||o) = limDa(p||o) =log(e) • 

a\l a/'l 




a=l 


(4.63) 


To evaluate this derivative, it is convenient to introduce a continuously differentiable two- 
parameter function (for fixed p and o) as follows: 


^(r,z) = Tr ^(o7po7)^^ with r{a) = ^ ' 


such that 4^ = —^ and = 1 and therefore 


(4.64) 


d 

da 


Qa{p\\(y) 


a=l 


1 d 

a? dr 


q{r,z) 


-^Tx-Kp) 


+ ^q{r,z) 

a=l 

|Tr(p. 


r=0 


a=l 


z=\ 


Tr(p(lnp — Ino)). 


(4.65) 

(4.66) 


In the penultimate step we exchanged the limits with the differentiation and in the last step we 
simply used the fact that the derivate commutes with the trace and that ^p^ = ln(p)p^. □ 

Let us have a look at two other special cases that are important for applications. First, at 
a = {5,2}, we find the negative logarithm of the quantum fidelity and the collision relative 
entropy [139], respectively. For p,a G S^{A),we have 


Di/ 2 {p\\(y) = -'iogF{p,a), D 2 (p||o) =logTr(po 7po 7 ). (4.67) 


4.3.3 Data-Processing Inequality 

Here we show that Da satisfies the data-processing inequality for a >\ . First, we show that our 
pinching inequalities in fact already imply the data-processing inequality for a > 1 , following 
an instructive argument due to Mosonyi and Ogawa in [119]. (For a G [ 5 , 1 ) we will need a 
completely different argument.) 
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From Pinching to Measuring and Data-Processing 


First, we restrict our attention to a > 1. According to (4.52), for any measurement map M G 
CPTP(A,2f) with POVM elements we find 


<2«(M(p)||M(g)) 
I spec(c7)|“ 


l-a 


< ea(M(J>a(p))||M((j)) 

= ^ (Tr (p))) “ (Tr(M^(J 

= ^ (Tr (T,(p))) “ (Tr(5>,(M,)(j) 


l-a 


(4.68) 

(4.69) 

(4.70) 


Now, note that W{x\a) = (a| VaiMx) |a) is a classical channel for states that are diagonal in the 
eigenbasis {|a)}a of CJ. Hence the classical data-processing inequality together with Lemma 4.2 
yields 

|spec((7)r“ea(M(p)||M((7)) <Qaiya{p)\\(y) <Qa{p\\(y)- (4.71) 

Using a by now standard argument, we consider n-fold product states p®” and a®” and a product 
measurement M®" in order to get rid of the spectral term in the limit as n —?► 0 °. This yields 

5a(p||(j)>5„(M(p)||M((j)) (4.72) 


for all measurement maps M. 

Combining this with Proposition 4.4 and interpreting the pinching map as a measurement in 
the eigenbasis of CJ, we have established that, for a > 1, the minimal quantum Renyi divergence 
is asymptotically achievable by a measurement: 

5a(p||a)= jim imax{D„(M„(p®”)||M„((J®'')) : M„ G CPTP(A«,X)} . (4.73) 

We will discuss this further below. Using the representation in (4.73) we can derive the data- 
processing inequality using a very general argument. 

Proposition 4.6. Let be a quantum Renyi divergence satisfying (4.73). Then, it also satisfies 
data-processing (VIII). 

Proof. We show that Da(p||(7) > D„(£(p)||£((7)) for all £ G CPTP(A,B) andp,(7 G y{A). 

First note that since £ is trace-preserving, £Ms unital. For every measurement map M G 
CPTP(B,2f) consisting of POVM elements {Mx}x, we dehne the measurement map G 
CPTP(A,X) that consists of the POVM elements {L\Mx)}x. Then, using (4.73) twice, we hnd 
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D„(£(p)|l^(a)) 

= jim ^sup|D„(M„(£(pf")||M„(£((jf")) : e CPTP(B",X)} (4.74) 

= lim-sup(Da(Mf®"(p®")||M^®"((J®")) : M„ S CPTP(B",X)| (4.75) 

n^oo n I J 

< lim -sup|d„(M„(p®")||M„((J®")) : M„ G CPTP(A",X)| (4.76) 

n^oo n I ' J 

= D„(p||(j). (4.77) 

This concludes the proof. □ 


Data-Processing via Joint Concavity 

Unfortunately, the first part of the above argument leading to (4.73) only goes through for a > I 
(and consequently in the limits a^l and a °°). However, the data-processing inequality holds 
more generally for all a > j, as was shown by Frank and Lieb [57]. 

It thus remains to show data-processing for a G [j, 1). Here we show the following equivalent 
statement (cf. Proposition 4.2): 

Proposition 4.7. The map (p, (j) i-G 2a(p||(T) is jointly concave for a G [j, 1). 

Proof. First, we express Qa as a minimization problem. To do this, we use (3.16) and set c = 
G (0,1], M — ffzpffi, and to find 

e„(p||(j)=Tr(((jzp(jz)“) <aTr(//p) + (l-a)Tr(((7-z//a-z)"^) . (4.78) 

for all H > 0 with H ^ p and equality can be achieved. Thus, we can write 

) : //>0,//>p|. (4.79) 

This nicely splits the contributions of p and a and we can deal with them separately. The term 
Tr(//p) is linear and thus concave in p. Next, we want to show that the second term is concave in 
a. To do this, we further decompose it as follows, using essentially the same ideas that we used 
above. First, using (3.16), we find 

< cTr-f (1-c)Tr(X), (4.80) 

which allows us to write 

Tr(^(//^Z( 7 C//-z)^^ =max|^Tr(//^Z(7'^//-zx'-^)-^^Tr(X): X > o|. (4.81) 

Since c G (0,1), Lieb’s concavity theorem (2.50) reveals that the function we maximize over 
is jointly concave in a and X. Note that generally the maximum of concave functions is not 
necessarily concave, but joint concavity in CJ and X is sufficient to ensure that the maximum is 


Qa 


(p||(7) = min I aTr(//p)-f (1 - a)Tr ((// 


1 ^ K 
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concave in CJ. Hence, 2a(p||o’) is the minimum of a jointly concave function, and thus jointly 
concave. □ 

The same proof strategy can be used to show that 2a (p 11 O’) is jointly convex for a > 1, but 
we already know that this holds due to our previous argument in Section 4.3.3 that established the 
data-processing inequality directly. 


Summary and Remarks 

Let us now summarize the results of this subsection in the following theorem. 


Theorem 4.1. Let Ct > j and p, <7 S with p ^0. The minimal quantum Renyi diver¬ 

gence has the following properties: 

• The functional {p,<y) >—i- 2a(p||o) is jointly concave for a € (5, 1) jointly convex 

for a G ( 0 , 0 °). ^ 

• The functional (p, O’) Z)a(p||(j) is jointly convex for a G(\, 1]. 

• For every £ € CPTP(A,B), the data-processing inequality holds, i.e. 

5a(p||cj)>D„(£(p)||£((7)). (4.82) 

• It is asymptotically achievable by a measurement, i.e. 

Da{p\\a) = lim - max (do (M„(p®") IIMJcj®")) : M„ e CPTP(A",X) 1. (4.83) 

n^oo n( " J 


A few remarks are in order here. First, note that one could potentially hope that the limit n^°° 
in (4.83) is not necessary. However, except for the two boundary points Ct = 5 and a = 00 , it is 
generally not sufficient to just consider measurements on a single system. (This effect is also 
called “information locking”.) 

For a G { 2 , °°}, we have in fact (without proof) 

5a(p||(j) = max{Da(M(p)||M(p)) : M G CPTP(A,X)}, (4.84) 

which has an interesting consequence. Namely, if we go through the proof of Proposition 4.6 we 
realize that we never use the fact that £ is completely positive, and in fact the data-processing 
inequality holds for all positive trace-preserving maps. Generally, for all a, the data-processing 
inequality holds if £®" is positive for all n, which is also strictly weaker than complete positivity. 

The data-processing inequality together with dehniteness of the classical Renyi divergence 
also establishes dehniteness (VII-) of the minimal quantum Renyi divergence for a > h and 
thus of all quantum Renyi divergences. Namely, if p 7 ^ O’, then there exists a measurements (for 
example an informationally complete measurement) M such that M(p) 7 ^ M(o), and thus 

Da(p\\a)>DaiM(p)\\M(a))>0. (4.85) 

This completes the discussion of the minimal quantum Renyi divergence. 



4.4 Petz Quantum Renyi Divergence 


61 


The minimal quantum Renyi divergences satisfy Properties (I)-(X) for a > h and thus 
constitute a family of Renyi divergences according to Definition 4.1. 


4.4 Petz Quantum Renyi Divergence 

A straight-forward generalization of the classical expression to quantum states is given by the 
following expression, which was originally investigated by Petz [132]. 


Definition 4.5. Let a € (0, 1) U and p,CT € ^{A) with p 7 ^ 0. Then we define the 

Petz qnantnm Renyi divergence of a with p as 


Da{p\\(y) 



if (a < 1A p / < 7 ) V p < (7 
else 


(4.86) 


Moreover, Dq and Di are defined as the respective limits of Da for a —7 {0,1}. 


This quantity turns out to have a clear operational interpretation in binary hypothesis testing, 
where it appears in the quantum generalization of the Chernoff and Hoeffding bounds. More 
surprisingly, it is also connected to the minimal quantum Renyi divergence via duality relations 
for conditional entropies, as we will see in the next chapter. 

We could as well have restricted the definition to a € [0,2] since the quantity appears not to 
be useful outside this range. For a = 2 it matches the maximal quantum Renyi divergence (cf. 
Figure 4.1) and it is also evident that 

ea(p||(j):=Tr(p“(j'-“) (4.87) 

is not convex in p (for general cr) since p“ is not operator convex for a > 2. 


4.4.1 Data-Processing Inequality 

As a direct consequence of the Lieb concavity theorem and the Ando convexity theorem in (2.50), 
we find the following. 

Proposition 4.8. The functional 2a(p||o’) is jointly concave for a S (0,1) and jointly convex for 

ae (1,2]. 

In particular, the Petz quantum Renyi divergence Da thus satisfies the data-processing inequal¬ 
ity. As such, we must also have 

Da{p\\(y)>Da{p\\(y) 


( 4 . 88 ) 
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since the latter quantity is the smallest quantity that satisfies data-processing. This inequality is 
in fact also a direct consequence of the Araki-Lieb-Thirring trace inequalities [5,108], which we 
will not discuss further here. 

Alternatively, the function Qa can be seen as a Petz quasi-entropy [132] (see also [85]). For 
this purpose, using the notation of Section 2.4.1, let us write 

(2a(p||(T) =Tr(p“a'^“) = ('f'|ff3/„((j^'(g)p^)a2 I'F) (4.89) 

where fa : f f “ is operator concave or convex for a G (0,1) and a G (1,2]. Petz used a variation 
of this representation to show the data-processing inequality. 

We leave it as an exercise to verify the remaining properties mentioned in Secs. 4.1.1 and 4.1.2 
for the Petz Renyi divergence. 

The Petz quantum Renyi divergences satisfy Properties (I)-(X) for a G (0,2]. 


4.4.2 Nussbaum-Szkola Distributions 


The following representation due to Nussbaum and Szkola [127] turns out to be quite useful in 
applications, and also allows us to further investigate the divergence. Let us fix p, a G S^o{A) and 
write their eigenvalue decomposition as 

P = and a = EPy |/y)(/v| ■ (4.90) 


Then, the two probability mass functions 

Pxf\^^y)=m^y\fy)\^ and Q^^f\x,y) = Hy\{ejc\fy)\^ 

mimic the Petz quantum divergence of the quantum states p and a. Namely, they satisfy 




for all a > 0. 


(4.91) 


(4.92) 


Moreover, these distributions inherit some important properties of p and a. For example, p 
a 4=^ pip 'll and for product states we have 


p[p8T,(T8(U] _ p[p,cr] |gjp[T,(B] 


(4.93) 


Last but not least, since this representation is independent of a, we are able to lift the convexity, 
monotonicity and limiting properties of a Da to the quantum regime — as a corollary of the 
respective classical properties. 


Corollary 4.3. The function a logQa{p\\(y) is convex, CC Da(p||c7) is monotonically 
increasing, and 
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Fig. 4.2 Minimal and Petz quantum Renyi entropy around ct = 1 . 


Di{p\\<y) 


Tr(p(logp-logg)) 

Tr(p) 


(4.94) 


So, in particular, Di (p || a) = Di (p || O’). This means that these two curves are tangential at this 
point and their first derivatives agree (cf. Figure 4.2). 


First Derivative at a = 1 

In fact, the Nussbaum-Szkola representation gives us a simple means to evaluate the first deriva¬ 
tive of a Dcf (p 11 o) and al-^•Z)^(p||o) at a = 1, which will turn out to be useful later. 

In order to do this, let us first take a step back and evaluate the derivative for classical proba¬ 
bility mass functions px, Ox S S'^oiX). Substituting a = 1 -|- v and introducing the log-likelihood 
ratio as a random variable Z{X) = ln(p(X)/o(X)), where X is distributed according to the law 
X 3— px, we find 


Di+v(Px|lox) 



logE (e''Z) 
V 


log(e) 



where G(v) is the cumulant generating function of Z. 

Clearly, G(0) = 0. Moreover, using THopital’s rule, its first derivative at v = 0 is 


lim 

v^O 


d G(v)\ 
dv V / 


li,^ yG'(v)-C(v 2 

v^O 


lim G'iv) + vG”iv)-G'iv) 
v^o 2v 


G"(0) 

2 


(4.95) 


(4.96) 

(4.97) 


which is one half of the second cumulant of Z. The second cumulant simply equals the second 
central moment, or variance, of the log-likelihood ratio Z. 
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G"(0) =E((Z-E(Z))2) =E(Z^)-E(Z)2 


ViPxWox) 

log(e)2 


Combining these steps, we have established that 


d 

da 


Da{Px\\(yx) 


a=\ 


1 

21 og(e) 


vipxWo^)- 


(4.98) 

(4.99) 


(4.100) 


Now we can simply substitute the Nussbaum-Szkola distributions to lift this result to the Petz 
quantum Renyi divergence, and thus also the minimal quantum Renyi divergence. We recover the 
following result [109]: 


Proposition 4.9. Let p,CJ S with p a. Then the functions a i—)• Da{p\\(y) and 

a I— )■ Da (p 1 1 O’) are continuously differentiable at a = 1 and 




a=i 21og(e) 
where V{p\\g) := Tr ^p(logp — logo —D(p||(7))^^. 


V{p\\CT), 


(4.101) 


The minimal and Petz quantum Renyi divergences are thus differentiable at a = 1 and in fact 
inhnitely differentiable. Hence, by Taylor’s theorem, for every interval [a,b] containing 1, there 
exist constants K G K+ such that, for all a G [a,h], we have 


Da(p||o)-D(p||(7)-(a-l) 


21 og(e) 


T(p||o) 


<K(a-l)^ 


(4.102) 


The same statement naturally also holds if we replace Da with Da- An example of the hrst-order 
Taylor series approximation is plotted in Figure 4.2. 


4.5 Background and Further Reading 

Shannon was hrst to derive the definition of entropy axiomatically [144] and many have followed 
his footsteps since. We exclusively consider Renyi’s approach [142] here, but a recent overview 
of different axiomatizations can be found in [40]. 

The Belavkin-Staszewski relative entropy [17] was considered a reasonable alternative to 
Umegaki’s relative entropy [166] until Hiai and Petz [86] established the operational interpreta¬ 
tion of Umegaki’s dehnition in quantum hypothesis testing. The proof that joint convexity implies 
data-processing is rather standard and mimics a development for the relative entropy that is due to 
Uhlmann [163,164] and Lindblad [1 10,11 1]. The data-processing inequality for the quantum rel¬ 
ative entropy has been shown in these works, building on previous work by Lieb and Ruskai [107] 
that established it for the partial trace. The data-processing inequality can be strengthened by in¬ 
cluding a remainder term that characterizes how well the channel can be recovered. This has been 
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shown by Fawzi and Renner [53] for the partial trace (see also [25,29] for refinements and sim¬ 
plifications of the proof). Recently these results were extended to general channels in [173] (see 
also [23]) and further refined in [149]. 

The max-divergence was first formally introduced by Datta [41], based on Renner’s work [139] 
treating conditional entropy. However, the idea to define a quantum relative entropy via an op¬ 
erator inequality appears implicitly in earlier literature, for example in the work of Jain, Rad- 
hakrishnan, and Sen [94]. The minimal (or sandwiched) quantum Renyi divergence was formally 
introduced independently in [122] and [175]. Some ideas resulting in the former work were al¬ 
ready presented publicly in [153] and [54], and partial results were published in [121] and [50, Th. 
21]. The initial works only proved a few properties of the divergence and left others as conjec¬ 
tures. Various other authors then contributed by showing data-processing for certain ranges of a 
concurrently with Frank and Lieb [57]. Notably, Muller-Lennert et al. [122] already establishes 
data-processing for a S (1,2] and conjectured it for all a >\ . ConcuiTently with [57], Beigi [15] 
provided a proof for data-processing for a > 1 and Mosonyi and Ogawa [119] provided the proof 
discussed above, which is also only valid for a > 1. Their proof in turn uses some of Hayashi’s 
ideas [75]. 

The minimal, maximal and Petz quantum Renyi divergence are by no means the only quantum 
generalizations of the Renyi divergence. For example, a two-parameter family of Renyi diver¬ 
gences proposed by Jaksic et al. [95] and further investigated by Audenaert and Datta [9] (see 
also [84] and [35]) captures both the minimal and Petz quantum Renyi divergence. 

Both quantum Renyi divergences discussed in this work have found applications beyond bi¬ 
nary quantum hypothesis testing. In particular, the minimal quantum Renyi divergence has turned 
out to be a very useful tool in order to establish the strong converse property for various informa¬ 
tion theoretic tasks. Most prominently it led to a strong converse for classical communication over 
entanglement-breaking channels [175], the entanglement-assisted capacity [68], and the quantum 
capacity of dephasing channels [161]. Furthermore, the strong converse exponents for coding 
over classical-quantum channels can be expressed in terms of the minimal quantum Renyi diver¬ 
gence [120]. The minimal quantum Renyi divergence of order 2 can also be used to derive various 
achievability results [16]. Besides this, the quantum Renyi divergences have also found applica¬ 
tions in quantum thermodynamics, e.g. in the study of the second law of thermodynamics [30], 
and in quantum cryptography, e.g. in [115]. 

Finally, we note that many of the definitions discussed here are perfectly sensible for infinite¬ 
dimensional quantum systems. However, some of the proofs we presented here do not directly 
generalize to this setting. Ohya and Petz’s book [129] treats quantum entropies in the even more 
general algebraic setting. However, a comprehensive investigation of the minimal quantum Renyi 
divergence in the infinite-dimensional or algebraic setting is missing. 



Chapter 5 

Conditional Renyi Entropy 


Conditional Entropies are measures of the uncertainty inherent in a system from the perspective 
of an observer who is given side information on the system. The system as well as the side 
information can be either classical or a quantum. The goal in this chapter is to dehne conditional 
Renyi entropies that are operationally signihcant measures of this uncertainty, and to explore their 
properties. Unconditional entropies are then simply a special case of conditional entropies where 
the side information is uncorrelated with the system under observation. 

We want the conditional Renyi entropies to retain most of the properties of the conditional von 
Neumann entropy, which is by now well established in quantum information theory. Most promi¬ 
nently, we expect that they satisfy a data-processing inequality: we require that the uncertainty 
of the system never decreases when the quantum system containing side information undergoes a 
physical evolution. This can be ensured by defining Renyi entropies in terms of the Renyi diver¬ 
gence, in analogy with the case of conditional von Neumann entropy. 


5.1 Conditional Entropy from Divergence 

Let us hrst recall Shannon’s dehnition of conditional entropy. For a joint probability mass function 
p{x,y) with marginals p(x) and p{y), the conditional Shannon entropy is given as 


H{x\Y)p^Y.p(y)f^(^\Y=y)p 

y 

(5.1) 

= EpWEp(-W>«s„,|,, 

(5.2) 


(5.3) 

= H{XY)p-H{Y)p, 

(5.4) 


where we used the conditional probability distribution p (x|y) = p {x,y)/p (y), and the correspond¬ 
ing Shannon entropy, H{X\Y =y)p. Such conditional distributions are ubiquitous in classical in¬ 
formation theory, but it is not immediate how to generalize this concept to quantum information. 
Instead, we avoid this issue altogether by generalizing the expression in (5.4), which is also called 
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the chain rule of the Shannon entropy. This yields the following definition for the quantum con¬ 
ditional entropy. 


Definition 5.1. For any bipartite state Pab G S^o{AB), we define the conditional von Nen- 
mann entropy of A given B for the state Pab as 

H{A\B)p:=H{AB)p-H{B)p, where //(A)p :=-Tr(pAlogp^). (5.5) 


Here, H{A)p is the von Neumann entropy [170] and simply corresponds to the Shannon en¬ 
tropy of the state’s eigenvalues. One of the most remarkable properties of the von Neumann 
entropy is strong subadditivity. It states that for any tripartite state Pabc G S^o{ABC), we have 

H{ABC)p +H{B)p < H{AB)p +H{BC)p (5.6) 

or, equivalently H{A\BC)p < H{A\B)p. The latter is is an expression of another principle, the 
data-processing inequality. It states that any processing of the side information system, in 
this case taking a partial trace, can at most increase the uncertainty of A. Formally, for any 
£ G CPTP(B,B') map we have 

HiA\B)p>H{A\B')r, where W = £(Pab). (5.7) 

This property of the von Neumann entropy was first proven by Lieb and Ruskai [107]. It implies 
weak sub additivity, and the relation [6] 

\HiA)p-H{B)p\<H{AB)p<H{A)p+H{B)p. (5.8) 

The conditional entropy can be conveniently expressed in terms of Umegaki’s relative entropy, 
namely 


H{A\B)p=H{AB)p-H{B)p 

(5.9) 

= - Tr (Pab log Pab ) + Tr (pb log Pb ) 

(5.10) 

= -Tr(pAB(logPAB-log(/A0PB))) 

(5.11) 

= —D{Pab\\Ia^Pb)- 

(5.12) 


Here, we used that log(/A(8)pB) =/A<8)logpB to establish (5.1 1). Sometimes it is useful to rephrase 
this expression as an optimization problem. Based on (5.11) we can introduce an auxiliary state 
Gb G S^o{B) and write 


H{A\B)p = -Tr(pAB(logpAB-7A®logO'B)) -f Tr(pB(logpB-log(7B)) (5.13) 

= -£)(pab||/a®Ob)+D(pb||(7b). (5.14) 

Since the latter divergence is always non-negative and equals zero if and only if Ob — PBi this 
yields the following expression for the conditional entropy; 

H{A\B)p= max -£»(pab|1/a 0 Cb). (5.15) 

aseyoiB) 
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5.2 Definitions and Properties 

In the case of quantum Renyi entropies, it is not immediate which of the relations (5.9), (5.12) 
or (5.15) should be used to define the conditional Renyi entropies. It has been found in the study 
of the classical special case (see, e.g. [55, 93]) that generalizations based on (5.9) have severe 
limitations, for example they generally do not satisfy a data-processing inequality. On the other 
hand, definitions based on the underlying divergence, as in (5.12) or (5.15), have proven to be very 
fruitful and lead to quantities with operational significance and useful mathematical properties. 

Together with the two proposed quantum generalizations of the Renyi divergence. Da and Da, 
this leads to a total of four different candidates for conditional Renyi entropies [122, 154, 155]. 


Definition 5.2. For a > 0 and Pab S S^o{AB), we define the following qnantnm condi- 

tional Renyi entropies of A given B of the state Pab- 



^a(^l^)p ■— ~Da{PAB\\^A® Pb), 


(5.16) 

Hl{A\B)p:= sup -Da(PABpA® C7b), 
ageyoiB) 


(5.17) 

HaiMB)p ■= -Da{pAB\\tA<S>PB), 

and 

(5.18) 

Hl{A\B)p:= sup -DaiPAB\\lA'S)GB). 
ageyoiB) 


(5.19) 


Note that for a > 1 the optimization over as can always be restricted to (Jb with support equal 
to the support of pg. Moreover, since small eigenvalues of ffg lead to a large divergence, we can 
further restrict Gb to a compact set of states with eigenvalues bounded away from 0. Since we are 
thus optimizing a continuous function over a compact set, we are justified in writing a maximum 
in the above definitions. Furthermore, pulling the optimization inside the logarithm, we see that 
these optimization problems are either convex (for a > 1) or concave (for a < 1). 

Consistent with the notation of the proceeding chapter, we also use Ha to refer to any of the 
four entropies and Ha to refer to the respective classical quantities. More precisely, we use Ha 
only to refer to quantum conditional Renyi entropies that satisfy data-processing, which — as we 
will see in Sec. 5.2.3 —means that Ha encompasses Ha for a € [0,2] and Ha for a € [^iH- 
For a trivial system B, we find that 

I0Ia('A)p = ~T*a(PA ||4l) = I ~ log IIPaII a • (5.20) 

reduces to the classical Renyi entropy of the eigenvalues of Pa - In particular, if a = 1, we always 
recover the von Neumann entropy. 

Finally, note that we use the symbols ‘f and ‘j,’ to express the observation that 

Hi{A\B)p>H^{A\B)p and (A|B)p > i/^(A|B)p (5.21) 

which follows trivially from the respective definitions. Furthermore, the Araki-Lieb-Thirring in¬ 
equality in (4.88) yields the relations 


Hl{A\B)p>Hl{A\B)p and H^„{A\B)p>H^{A\B)p. 


(5.22) 
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H'a{A\B)p - Hl,(A\B)p 


Hj,{A\B)p -- H^a(A\B)p 

Fig. 5.1 Overview of the different conditional entropies used in this paper. Arrows indicate that one entropy is 
larger or equal to the other for all states pp^g 6 {AB) and all a > 0. 



These relations are summarized in Fig. 5.1. 


Limits and Special Cases 

Inheriting these properties from the corresponding divergences, all entropies are monotoni- 
cally decreasing functions of a, and we recover many interesting special cases in the limits 
a -A { 0 , 1 ,°°}. 

For a = 1, all definitions coincide with the usual von Neumann conditional entropy (5.12). For 
a = oo, two quantum generalizations of the conditional min-entropy emerge, both of which have 


been studied by Renner [139]. Namely, 

//i(A|B)p = sup{l G K ; Pab < ®Pb} and (5.23) 

Hl{A\B)p = sup {A G K : Bag G o5^o(B) such that Pab < 2^'^/a (8) Ob}. (5.24) 

For a = j, we find the conditional max-entropy studied by Konig et al. [101],' 

HlJA\B)p= sup \ogF{pABjA®cyB) ■ (5.25) 

ageyoiB) 

For a = 2, we find a quantum conditional collision entropy [139]: 

^2 (A\B)p = - logTr (^Pab (Ja ® Pb ^) Pab ® Pb ^) ) • (5-26) 

For a = 0, we find a generalization of the Hartley entropy [72], proposed in [139]: 

H^{A\B)p = sup logTr ({pab > 0}/a 0 Ob) • (5.27) 

O'sGJZi(B) 


5.2.1 Alternative Expression for 

For the quantity we find a closed-form expression for the optimal (minimal or maximal) Ob- 
This yields an alternative expression forH^ as follows [145,154]. 


* The notation 7/min(A|5)p|p = H}^(A\B)p and /7min(4|fi)p = H2,{A\B)p is widely used. The alternative notation 
Hmax(A\B)p = H^^^{A\B)p is often used too, for example in Chapter 6. 
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Lemma 5.1. Let a G (0,1) U (l,oo) and p^B & S^{AB). Then, 

Hl{A\B)p = Y^logTr((TrA(p^“5))^). (5.28) 


Proof. Recall the definition 


K{A\B)p= sup -^logTr(p^B(7^ “) 

ageyoiB) i “ « 


sup -^logTr(TrA(p“B)(JB “)• 


(5.29) 


This can immediately be lower bounded by the expression in (5.28) by substituting 

(TrA(p;^))-^ 

Tr((Tr^(p;^))^) 


(5.30) 


for Ob- It remains to show that this choice is optimal. For a < 1, we employ the Holder inequality 
in (3.5) for p = q = L = TtAip^B) ^ 


Tr(Tr^(p«5)(j^“) < (^Tr ((Tr^(p“5)) (Tr((T5))'-“ 


(5.31) 


which yields the desired upper bound since Tr(ffB) = 1. For a > 1, we instead use the reverse 
Holder inequality (3.6). This leads us to (5.28) upon the same substitutions. □ 


In particular, note that (5.30) gives an explicit expression for the optimal Og in the definition 
of Ha. A similar closed-form expression for the optimal Ob in the definition of Ha is however not 
known. 


5 . 2.2 Conditioning on Classical Information 

We now analyze the behavior of Da and Ha when applied to partly classical states. Formally, 
consider normalized classical-quantum states of the form pxA =fLxP (■^) ® Pa (.^) and OxA = 

(7(x) |x)(.r| 0 Oa (x). a straightforward calculation using Property (VI) shows that for two such 
states, 

Da(PxA||OxA) = ■ (^-^2) 

In other words, the divergence Da (pxA 11 <7xa ) decomposes into the divergences Da (Pa (x) 11 6a (x) ) 
of the ‘conditional’ states. This leads to the following relations for conditional Renyi entropies. 


Proposition 5.1. Let Paby = LyP{y)PAB{y) ® |y)(y| G yo{ABY) and a G (0,1) U 
Then, the conditional entropies satisfy 
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^UMby)p= 

£p(y)exp( 

(l-a)Hi(A|B)^(y))^ , 

(5.33) 

KiA\BY)p = j^log( 

^p(y)exp( 


(5.34) 

(Here, Ha is a substitute for Ha or Ha 

■) 




Proof. The first statement follows directly from (5.32) and the definition of the ‘^.’-entropy. To 
show the second statement, recall that by definition, 

]HI^(A|BT)p= max -BaiPABY\\lA<^ ctby) (5.35) 

aBYeyo(BY) 

where the infimum is over all (normalized) states Gby^ but due to data processing (we can measure 
the T-register, which does not affect Paby), we can restrict to states Gby with classical Y, i.e. 
Osy = Lv b)(3'l ® ^B{y)- Using the decomposition of in (5.32), we then obtain 

^lciA\BY)p= max - log ^ ^ p (y) “ a (y)' exp (a - 1 )»„ (pab (y) | |/a 0 (y )) ) ^ 

= max —log("^p(y)“a(y)^-“exp((l-a)Hj,(A|B)p( 3 ,))y (5.36) 

{cr(y)}j, i - a V ,, V / y 

Writing ry = p(y)exp(b^IHIa(A|Z?)^(y)), and using straightforward Lagrange multiplier tech¬ 
nique, one can show that the infimum is attained by the distribution (7(y) = ry/Y.z U- Substituting 
this into the above equation leads to the desired relation. □ 

In particular, considering a state pxy = P |■^)(■^| ® |y)(y 1^ we recover two notions of classi¬ 
cal conditional Renyi entropy 


H^z{X\Y)p = z^\og( 

^EEpwp(-*b)“^ , 

(5.37) 

Hl{X\Y)p = j^\og{ 


(5.38) 


where the latter was originally suggested by Arimoto [7]. 


5.2.3 Data-Processing Inequalities and Concavity 

Let us first discuss some important properties that immediately follow from the respective proper¬ 
ties of the underlying divergence. First, the conditional Renyi entropies satisfy a data-processing 
inequality. 
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Corollary 5.1. For any channel £ € CPTP(B,B') with x^b' = £-{Pab) far any state Pab G 
(AB), we have 

l„(A|B)p <1 „(A|B')t far ae[0,2] (5.39) 

i„(A|B)p<i„(A|B')t far a>^. (5.40) 

(Here, Ha is a substitute far either or H^, and the same far Ha.) 

In particular, these entropies thus satisfy strong subadditivity in the form 


Ma(A|BC)p<H„(A|B)p 


(5.41) 


for the respective ranges of a. 

Furthermore, it is easy to verify that these entropies are invariant under applications of local 
isometries on either the A or B systems. Moreover, for any sub-unital map IF € CPTP(A,A') and 
Ft'S = 3'{Pab), we get 

= -5(vb|14,®Tb) > -5(4,^114(4)0 Tb) (5.42) 

>-D{pab\\Ia<S)Pb)=H^{A\B). (5.43) 

and an analogous argument for the other entropies reveals Ha(A'|B)T > Ha(A|B)p for all en¬ 
tropies with data-processing. Hence, sub-unital maps on A do not decrease the uncertainty about 
A. However, note that the condition that the map be sub-unital is crucial, and counter-examples 
are abound if it is not. 

Finally, as for the divergence itself, the above data-processing inequalities remain valid if the 
maps £ and 4are trace non-increasing and Tr(£(p)) = Tr(p) and Tr(4(p)) = Tr(p), respectively. 

As another consequence of the joint concavity of Qa for a < 1, we find that p i—> Ha(A|B)p is 
concave for all a G [0,1]. Moreover it is quasi-concave for a G [1,2]. Similarly p i—> Ha(A|B)p 
is concave for all a G 1] and quasi-concave for a > 1. 


5.3 Duality Relations and their Applications 

We have now introduced four different quantum conditional Renyi entropies. Here we show that 
these definitions are in fact related and complement each other via duality relations. It is well 
known that, for any tripartite pure state Pabc, the relation 

H{A\B)p+H{A\C)p=0 (5.44) 

holds. We call this a duality relation for the conditional entropy. To see this, simply write 
H(A\B)p — H{pab) —H{pb) and H{A\C)p = H(pac) —H{pc) and verify consulting the Schmidt 
decomposition that the spectra of Pab and pc as well as the spectra of Pb and Pac agree. The 
significance of this relation is manyfold — for example it turns out to be useful in cryptography 
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where the information an adversarial party, let us say C, has about a quantum system A, can be 
estimated using local state tomography by two honest parties, A and B. 

In the following, we are interested to see if such relations hold more generally for conditional 
Renyi entropies. 


5.3.1 Duality Relation for Ha 

It was shown in [155] that indeed satisfies a duality relation. 


Proposition 5.2. For any pure state Pabc G J^o{ABC), we have 

H^{A\B)p+^(A\C)p=0 when a + j3 = 2, a,j3 G [0,2]. (5.45) 


Proof. By definition, we have Ha{A\B)p = ^ogQa{PAB\\tA ® Pb)- Now, note that 

ea(PAB||/A(8)pB) =Tr(p“5p]-“) =Tr(p“5-' |p)(plABcPi^“) (5-46) 

= Tr(p“-' |p)(pU5cPic“) =Tr(pr‘pic“)- (5.47) 
The result then follows by substituting a = 2 — (5. □ 

Note that the map a va- P — 2— a maps the interval [0,2], where data-processing holds, onto 
itself. This is not surprising. Indeed, consider the Stinespring dilation U G CPTP{B,B'B") of a 
quantum channel £ G CPTP(B,B'). Then, for Pabc pure, Tab'b"c = ^(Pabc) N also pure and the 
above duality relation implies that 

H^{A\B)p < H^{A\B’f ^ H^p(A\C)p > H^p{A\B”Cf. (5.48) 

Hence, data-processing for a holds if and only if data-processing for p holds. 


5.3.2 Duality Relation for Ha 

It was shown in [15,122] that a similar relation holds for generalizing a well-known relation 
between the min- and max-entropies [101]. 


Proposition 5.3. For any pure state Pabc G {ABC), we have 


1 1 


Hl{A\B)p+Hl{A\C)p=Q when - + -=2,a,p 


L2’ 


(5.49) 
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Proof. Without loss of generality, we assume that a > 1 and j3 < 1. Since (0,1) 9 a' := = 

— =: —p', it suffices to show that 


min (Qa{PAB\\lA‘^(yB))°‘ = max (gn(P/1 bI14® Ob)) ^ , (5.50) 

or, equivalently, min„^g^^(B) \\pilog^'p'll\\^ = max^^g^^(c) ||Pac%^ Pac||/ 3 - Now, leveraging 
the Holder and reverse Holder inequalities in Lemma 3.1, we hnd for any Ms <^(A), 

IlMlL = max| Tr(MA^) : > 0, |W|!i/„/< l| = max Tr(MT“), and (5.51) 

1 ' J xe^oiA) 

||M |||3 = min |Tr(MA^): > 0,A^ >• M, |]_i/j 3 /< l| = ruin Tr(M( 7 ^). (5.52) 

OM 

In the last expression we can safely ignore operators (J^ M since those will certainly not achieve 
the minimum. Substituting this into the above expressions, we find 


1/2 - a ' 1/2 

Pa>b Pab 


max Tr ( p'/^Gs pfA 

TAB^yo(AB) 


(5.53) 


and, furthermore, choosing jf') S 3^ (ABC) to be the unnormalized maximally entangled state 
with regards to the Schmidt bases of |p)^bc in the decomposition AB : C, we hnd 


max Tr 

t4bGJZ’o(ab) 



'P a' 
Pab<b 


max 

tcgMo(C) 




max 

tcgM„(C) 





(5.54) 

(5.55) 


An analogous argument also reveals that 


1/2 -fi' 1/2 
Pac% Pac 

= min (p 

G^ 

p) = min (p 

Gb^'®Tc 




/ABC ageyAB) \ 




(5.56) 


At this points it only remains to show that the minimum over Ob and the maximum over Tc can 
be interchanged. This can be verihed using Sion’s minimax theorem [146], noting that (p|( 7 g “ 0 
'^C IP) 2 iBC is convex in Gb and concave in Tc, and we are optimizing over a compact convex space. 

□ 


We again note that the map a i-> j3 = 2 a-\ maps [ 5 ,°°] onto itself 


5.3.3 Duality Relation for and Ha 

The alternative expression in Lemma 5.1 leads us to the hnal duality relation, which establishes a 
surprising connection between two quantum Renyi entropies [154]. 
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Proposition 5.4. For any pure state Pabc G S^o{ABC), we have 

Hl(A\B)p+H^^{A\C)p=0 when a)3 = 1 , a,i3 S [ 0 ,°o]. (5.57) 

Proof. First we note that P — ^ and ~ Then, using the expression in Lemma 5.1, it 

remains to show that 

Ti-((Tr,4(p^B))“) =Tr((p^'pAcPc') “). where a'=. (5.58) 

In the following we show something stronger, namely that the operators 

Tr/t(p“B) and Pc PacPc (5-59) 

are unitarily equivalent. This is true since both of these operators are marginals — on B and AC — 
of the same tripartite rank -1 operator, p“ Pabc Pc ■ To see that this is indeed true, note the first 
operator in (5.59) can be rewritten as 

Tr/i (p^b) = TrA (PabPab Pab) = Tiac {PabPabcPab) = Tiac {Pc Pabc Pc) ■ (5-60) 

The last equality can be verified using the Schmidt decomposition of Pabc with regards to the 
partition AB;C. □ 

Again, note that the transformation a j3 = T maps the interval [0,2] where data-processing 
holds for Ha to the interval [ 5 ,°°] where data-processing holds for Hp, and vice versa. 


5.3.4 Additivity for Tensor Product States 

One implication of the duality relation for Ha is that it allows us to show additivity for this 
quantity. Namely, we can use it to show the following corollary. 


Corollary 5.2. For any product state Pab ® "Ia'b' ct £ [ 5 )°°). we have 

hI{AA!\BB')p^, = Hl{A\B)p +Hl{A'\B'), . (5.61) 

Proof. By definition of Ha {AA'\BB')pg,T: we immediately find the following chain of inequalities: 


~ T)a (Pab <8 Li'B'II^AA'® (5.62) 

< min Da(pAB'^'^A'B'\\tA'^(yB'^lA'‘^COB') (5.63) 

as€y(B), ^ ' 

= Hl{A\B)p+Hl{A'\B'),. (5.64) 
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To establish the opposite inequality we introduce purifications Pabc of Pab of Ta'b' 

and choose j5 such that ^ ^ = 2. Then, an instance of the above inequality (5.62)-(5.64) reads 

Hl^iAA'\CC')p^r<H^piA\C)p+HliiA'\C')r. (5.65) 

The duality relation in Prop. 5.3 then yields Ha(AA'\BB')p^x > Ha(A\B)p +con¬ 
cluding the proof. □ 

Finally, note that the corresponding additivity relations for and Ha are evident from the 
respective definition. Additivity for Ha in turn follows directly from the explicit expression es¬ 
tablished in Lemma 5.1. 


5.3.5 Lower and Upper Bounds on Quantum Renyi Entropy 

The above duality relations also yield relations between different conditional Renyi entropies for 
arbitrary mixed states [154]. 


Corollary 5.3. Let Pab G S^o{AB). Then, the following holds for a G 


2 ’ 


H^aiA\B)p<Hl^iA\B)p.^ 


HliA\B)p<Hl^iA\B)p, 


H^a{mp<HlAA\B)p., 


H^iA\B)p<Hl^{A\B)p. 


(5.66) 

(5.67) 


Proof Consider an arbitrary purification Pabc G S^{ABC) of Pab- The relations of Fig. 5.1, for 
any 7 > 0, applied to the marginal Pac are given as 

Hl{A\C)p > H^{A\C)p > H^{A\C)p, and (5.68) 

H^iA\C)p > Hl{A\C)p > H^iA\C)p . (5.69) 

We then substitute the corresponding dual entropies according to the duality relations in Sec. 5.3, 
which yields the desired inequalities upon appropriate new parametrization. □ 

Some special cases of these inequalities are well known and have operational significance. For 
example, (5.67) for a states that Hj^{A\B)p < /i^(A|B)p, which relates the conditional min- 
entropy in (5.24) to the conditional collision entropy in (5.26). To understand this inequality more 
operationally we rewrite the conditional min-entropy as its dual semi-definite program [101] (see 
also Chatper 6), 

HliA\B)p^ min -log ((4L( £(P/ 1 b)) , (5.70) 

£eCPTP(B,A') ' ' 

where A' is a copy of A and xj/aa' is the maximally entangled state on A : A'. Now, the above 
inequality becomes apparent since the conditional collision entropy can be written as [21] 
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H^{A\B)p = -log(t/^F(0^,,£Pg(pAB)), (5.71) 

where £?§ denotes the pretty good recovery map of Barnum and Knill [13]. 

Finally, (5.66) for CC — j yields H^^^{A\B)p < Hq (A|B)p, which relates the quantum conditional 
max-entropy in (5.25) to the quantum conditional generalization of the Hartley entropy in (5.27). 


Dimension Bounds 

First, note two particular inequalities from Corollary 5.3: 

Hi{A\B)p<H^{A\B)p and H^'^^{A\B)p < H^{A\B)p. (5.72) 

From this and the monotonicity in a, we find that all conditional entropies (that satisfy the data- 
processing inequality) can be upper and lower bounded as follows. 

Hi{A\B)p < H„(A|B)p < H^iA\B)p . (5.73) 

Thus, in order to find upper and lower bounds on quantum Renyi entropies it suffices to investigate 
these two quantities. 

Lemma 5.2. Let Pab G S^o{AB). Then the following holds: 

-logmin{rank(pA),rank(pB)} < Ha(A|B)p < logrank(pA). (5.74) 

Moreover, IHIa(A|Z?)p > 0 if Pab is separable. 


Proof Without loss of generality (due to invariance under local isometries) we assume that Pa 
and Pb have full rank. The upper bound follows since Hq{A\B)p < Ho(A)p — \ogdA. Similarly, 
we find Hf{A\B)p = —Hq{A\C)p > —Ho(A)p — — log( 7/1 by taking into account an arbitrary pu¬ 
rification Pabc of Pab- On the other hand, for any decomposition Pab = \(l>i){(l>i\ into pure 

states, quasi-concavity of Ha (which is a direct consequence of the quasi-convexity of Da) yields 

Hi{A\B)p > min//i(A|B)0,. = min-//o(A)0,. > -\ogdB ■ (5.75) 

This concludes the proof of the first statement. 

For separable states, we may write 

Pab =Y,Pk^A‘^'^B< Y^Pkh 0 4 = 4 O Pb , (5.76) 

k k 


and, hence, Hf{A\B)p = sup{A G K : Pab < exp(—A)//i 0 p^} > 0. 


□ 
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5.4 Chain Rules 

The chain rule, H{AB\C) = H{A\BC) +//(B|C), is fundamentally important in many applications 
because it allows us to see the entropy of a system as the sum of the entropies of its parts. However, 
IHIa(AB|C) = Ma{A\BC) +IHIa(B|C), generally does not hold for a ^ 1. Nonetheless, there exist 
weaker statements that we can prove. 

For a first such statement, we note that for any Pabc S 5^o{ABC), the inequality 

PBC<^^v{-Hi{B\C)p)lB®Pc (5.77) 

holds by definition of Hence, using the dominance relation of the Renyi divergence, we find 

H^a{A\BC)p = -Da{PABc\\lA®pBc) (5.78) 

< -Da{pABc\\lAB®Pc)-Ht{B\C)p, (5.79) 

or, equivalently Ha[AB\C)p > Ha{A\BC)p +H^{B\C)p. Using an analogous argument we get the 
same statement also for Ha- 


Proposition 5.5. For any state Pabc € -9’o{ABC), we have 

Mi{AB\C)p > Hi(A|BC)p +Hi{B\C)p . (5.80) 


Several other variations of the chain rule can now be established using the duality relations, for 
example 


Hl{AB\C)p<H^iA\BC)p+H^{B\C)p. (5.81) 

Next, let us try to find a chain rule that only involves entropies of the ‘f type. For this purpose, 
we follow the above argument but start with the fact that 

pBC<exp(-H^(B|C)p)/B®CJc (5.82) 

for some Oc C S^o{C). This yields the relation 

H^a{AB\C)p > H^{A\BC)p +HliB\C)p (5.83) 

and we can use the inequality in (5.67) to remove the remaining ‘J,’. This leads to 

HliAB\C)p>H^p{A\BC)p+HiiB\C)p, « = 2-^. (5.84) 

This result is a special case of a beautiful set of chain rules for Ha that were recently established 
by Dupuis [49]. 
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Theorem 5.1. Let Pabc G S^{ABC) and a,l5,Y€ (^il) U (1,°°) such that + 

Then, if {a - l)(j3 - 1)(7- 1) > 0, 

H^aiAB\C)p > H'piA\BC)p +//;(B|C)p , (5.85) 

and the inequality is reversed !/(« — l)(j3 — 1)(7 — 1) <0. 

The proof in [49] is outside the scope of this book (see also Beigi [15]). The chain rules for 
the von Neumann entropy follow as a limit of the above relation. For example, if we choose 
j3 = 7 = 1 + 2e so that a — for a small parameter e —^ 0, we recover the relation 

H{AB\C)p > H{A\BC)p +H{B\C)p . (5.86) 

The opposite inequality follows by choosing p = 7 = 1 — 2e. 

Finally, we want to stress that slightly stronger chain rules are sometimes possible when the 
underlying state has structure. 


Entropy of Classical Information 

We explore this with the example of classical and coherent-classical quantum states, which arise 
when we purify classical systems. For concreteness, consider a state p G SA, (XAB) that is classical 
on X, and a purification of the form 

Pxx'ABC ■= E k') (^\x ® k') i^\x' ® \p{x')){p{x)\^BC ^ (5.87) 

xy 

where Pabc{x) is a purification of Pab(x)- We say that Pxx'ABC coherent-classical between X 
and X': if one of these systems is traced out the remaining states are isomorphic and classical on 
X or X', respectively. 

Lemma 5.3. Let p G y’, [XX'AB) be coherent-classical between X and X'. Then, 

Wl[XA\X'B)p<Wl{A\XX'B)p and S„(XA|B)p > H„(A|B)p . (5.88) 

The second statement reveals that classical information has non-negative entropy, regardless of 
the nature of the state on AB. (Note that Lemma 5.2 already established this fact for the case 
where A is trivial.) 

Proof. We will establish the first inequality for all conditional Renyi entropies of the type ‘f. 
The second inequality then follows by the respective duality relations, and a relabelling B ggC. 

Let a < I such that Qa is jointly concave the data-processing inequality ensures that Qa is 
non-decreasing under TPCP maps. Then define the projector TIxx' = Lv k)(-^lx ® \x){x\x'- Clearly, 
Pxx'AB = ^xx'Pxx'AB^xx'■ Hence, for any a G yo{X'B), the data-processing inequality yields 

QkCPxX'AbII^XAC) Ox'b) < Qa(PxA''ABpA C)Hxx'(7x'C) C7 x'b)Hxz') 

< max Qa [Pxx'abWa ® '^xx'b) > 
aey^iXX'B) 


(5.89) 

(5.90) 
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where we used that Tr{nxx>{Ix' ® '^x'b)^xx') = Tr((Jy/ 5 ) = 1. We conclude that the desired 
statement holds for a < 1, and for a > 1 an analogous argument with opposite inequalities ap¬ 
plies. □ 

Finally, the following result gives dimension-dependent bounds on how much information a 
classical register can contain. 

Lemma 5.4. Let p G [XAB) be classical on X. Then, 

Ul{XA\B)p < Ul{A\XB)+logdx ■ (5.91) 

Proof. Simply note that for any Og G S^o{B), we have 

Da(pxAB||fxA <8 O’b) = '^a{PAXB\\lA® {t^x ® Ob)) — \ogdx (5.92) 

> min D«(pAXBpA®OxB)-log£/x. (5.93) 


For example, combining the above two lemmas, we find that 

Hl{A\B)p < Hl{AX\B)p < Hl,{A\BX)p+\ogdx . (5.94) 


5.5 Background and Further Reading 

Strong subadditivity (5.6) was first conjectured by Lanford and Robinson in [104]. Its first proof 
by Lieb and Ruskai [107] is one of the most celebrated results in quantum information theory. The 
original proof is based on Lieb’s theorem [106]. Simpler proofs were subsequently presented by 
Nielsen and Petz [126] and Ruskai [143], amongst others. In this book we proved this statement 
indirectly via the data-processing inequality for the relative entropy, which in turns follows by 
continuity from the data-processing inequality for the Renyi divergence in Chapter 4. We also 
provide an elementary proof in Appendix A. 

The classical version of Ha was introduced by Arimoto for an evaluation of the guessing 
probability [7]. Gallager used Ha to upper bound the decoding error probability of a random 
coding scheme for data compression with side-information [64]. More recently, the classical and 
the classical-quantum special cases of Ha were investigated by Hayashi (see, for example, [79]). 

The quantum conditional Renyi entropy Ha was first studied in [155]. We note that the ex¬ 
pression for Ha in Lemma 5.1 can be derived using a quantum Sibson’s identity, first proposed 
by Sharma and Warsi [145]. On the other hand, the quantum Renyi entropy Ha was proposed 
in [153] and investigated in [122], whereas H^ is first considered in [154]. 

It is an open question whether the inequalities in Corollary 5.3 also hold for the Renyi diver¬ 
gences themselves. Relatedly, Mosonyi [118] used a converse of the Araki-Lieb-Thirring trace 
inequality due to Audenaeit [ 8 ] to find a converse to the ordering relation Z)a(p||(j) > Da{p\\o), 
namely 

Daip\\o) > aDa(p||(7)-|-logTr(p“) -l-(a- l)log||(7| 


(5.95) 
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In this book we focus our attention on conditional Renyi entropies, but similar techniques can 
also be used to explore Renyi generalizations of the mutual information [68,80] and conditional 
mutual information [24] . 


Chapter 6 

Smooth Entropy Calculus 


Smooth Renyi entropies are defined as optimizations (either minimizations or maximization) of 
Renyi entropies over a set of close states. For many applications it suffices to consider just two 
smooth Renyi entropies: the smooth min-entropy acts as a representative of all conditional Renyi 
entropies with a > I, whereas the smooth max-entropy acts as a representative for all Renyi en¬ 
tropies with a < 1. These two entropies have particularly nice properties and can be expressed 
in various different ways, for example as semi-definite optimization problems. Most importantly, 
they give rise to an entropic (and fully quantum) version of the asymptotic equipartition property, 
which states that both the (regularized) smooth min- and max-entropies converge to the condi¬ 
tional von Neumann entropy for iid product states. This is because smoothing implicitly allows 
us to restrict our attention to a typical subspace where all conditional Renyi entropies coincide 
with the von Neumann entropy. Furthermore, we will see that the smooth entropies inherit many 
properties of the underlying Renyi entropies. 


6.1 Min- and Max-Entropy 

This section develops a variety of useful alternative expressions for the min- and max-entropies, 
and In particular, we express both the min- and the max-entropy in terms of semi-definite 
programs. 


6.1.1 Semi-Definite Programs 

Optimization problems that can be formulated as semi-definite programs are particularly interest¬ 
ing because they have a rich structure and efficient numerical solvers. Here we present a formu¬ 
lation of semi-definite programs that has a very symmetric structure, following Watrous’ lecture 
notes [171]. 

Definition 6.1. A semi-definite program (SDP) is a triple {K,L,8.}, where K € .if^(A), 

L G ^^{B) and £ G (B)) is a super-operator from A to B that preserves self- 
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adjointness. The following two optimization problems are associated with the semi-definite 

program: 



primal problem 

dual problem 


minimize ; Tr(/rA) 

maximize ; Tr(LT) 

(6.1) 

subject to ; E{X) > L 

subject to ; E\Y) <K 

X G ^{A) 

Y G ^{B) 



We call an operator X G ^(A) primal feasible if it satisfies £(X) > L. Similarly, we say that 
Y G ^{B) is dual feasible if £’*'(F) < K. Moreover, we denote the optimal solution of the primal 
problem by a and the optimal solution of the dual problem by b. Formally, we define 

a = mf{Tr{LX):X G ^(A), e{X)>K} (6.2) 

b = sup{Tr{KY):Y G ^{B), E\y)<L}. (6.3) 

The following two statements are true for any SDP and provide a relation between the primal 
and dual problem. The first fact is called weak duality, and the second statement is also known as 
Slater’s condition for strong duality. 

Weak Duality; We have a>b. 

Strong Duality; If a is finite and there exists an operator T > 0 such that £' (T) <K, then a = b 
and there exists a primal feasible X such that Tr(^r2f ) = a. 

For a proof we defer to [171]. As an immediate consequence, this implies that every dual 
feasible operator Y provides a lower bound of Tr(LT) on a and every primal feasible operator X 
provides an upper bound of Tr(/rA) on p. 


6.1.2 The Min-Entropy 

We first recall the expression for in (5.24), which we will simply call min-entropy in this 
chapter. We extend the definition to include sub-normalized states [139]. 


Definition 6.2. Let Pab G {^B). The min-entropy of A conditioned on B of the state Pab 

is 

Hrnm{A\B)p = sup sup {X gM.: Pab < exp(-A)£4 (g) cjb} . (6.4) 

oaey.iB) 


Let us take a closer look at the inner supremum first. First, note that there exists a feasible 
X if and only if Gb ^ Pb- However, if this condition on the support is satisfied, then using the 
generalized inverse, we find that 
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A* = — log 


Ob ^PabOb z 


(6.5) 


is feasible and achieves the maximum. The min-entropy can thus alternatively be written as 


HmmiA\B)p = max-log 


Ob ’^Pab^^b ^ 


( 6 . 6 ) 


where we use the generalized inverse and the maximum is taken over all Og G with ag ^ 

pg. We can also reformulate (6.4) as a semi-definite program. 

For this purpose, we include the factor exp(—A) in Og and allow Og to be an arbitrary positive 
semi-definite operator. The min-entropy can then be written as 

Hm[n{A\B)p = -logmin{Tr((JB) : Ob S ^{B) A pAg < 4® Ob} . (6.7) 


In particular, we consider the following semi-definite optimization problem for the expression 
exp(—//min(A|B)p), which has an efficient numerical solver. 


Lemma 6.1. Let Pab C S^,{AB). Then, 

the following two optimization problems 

satisfy 

strong duality and both evaluate to exp(- 

-H^,n{A\B)p). 


primal problem 

dual problem 


minimize ; Tr((7B) 

maximize : TrjpAB^AB) 

(6.8) 

subject to ; /a C) Ob > Pab 

subject to : TrA[AAB] < h 

Ob > 0 

Xab>0 



Proof. Clearly, the dual problem has a finite solution; in fact, we always have 1:x[pab^ab\ < 
Tr^AB < dg. Furthermore, there exists a Ob > 0 with lA®Og> Pab- Hence, strong duality applies 
and the values of the primal and dual problems are equal. □ 

Let us investigate the dual problem next. We can replace the inequality in the condition Xg < Ig 
by an equality since adding a positive part to XAg only increases Tr{pAgXAB)- Hence, Aab can be 
interpreted as a Choi-Jamiolkowski state of a unital CP map (cf. Sec. 2.6.4) from to J^g. Let 
be that map, then 


exp(-//min(A|B)p) =maxTr(pAB£^('f^')) = dAinaxTr {E[pab]Waa') , ( 6 . 9 ) 

£' £ 

where the second maximization is over all £ G CPTP(B,A'), i.e. all maps whose adjoint is com¬ 
pletely positive and unital from A' to B. The fully entangled state v/aa' = 'Baa' /^a is pure and 
normalized and if pAg G y'a{AB) is normalized as well, we can rewrite the above expression in 
terms of the fidelity [101] 


77mm(A|B)p = -log t/A max F {^{Pab),Waa')] > -'^ogdA- (6.10) 

V £eCPTP(B,A') / 

(Note that \j/ is defined as the fully entangled in an arbitrary but fixed basis of and The 
expression is invariant under the choice of basis, since the fully entangled states can be converted 
into each other by an isometry appended to £.) 













86 


6 Smooth Entropy Calculus 


Alternatively, we can interpret as the Choi-Jamiolkowski state of a TP-CPM map from 
Mgt to Ma, leading to 




i(A|B)p = - log 


max Tr(pAB£(v/BB')) >-log£/B. (6.11) 

eGCPTP(B'4) ' 


6.1.3 The Max-Entropy 

We use the following definition of the max-entropy, which coincides with hIj^ in the case where 
Pab is normalized. 

Definition 6.3. Let Pab € S^,{AB). The max-entropy of A conditioned on B of the state 
Pab is 

HmaxiA\B)p:= max log F{pab, I A‘Si Ob) ■ (6.12) 

aB&y,(B) 

Clearly, the maximum is taken for a normalized state in S^o{B). However, note that the fidelity 
term is not linear in Gb, and thus this cannot directly be interpreted as an SDP This can be 
overcome by introducing an arbitrary purification Pabc of Pab and applying Uhlmann’s theorem, 
which yields 


exp(H^^^{A\B)p) =dA max (PabcKabcIPabc) , (6.13) 

tABC&y.{ABC) 


where Tabc has marginal Tab = 'XaS Ob for some Gb C S^,{B). This is the dual problem of a 
semi-definite program. 

Lemma 6.2. Let Pab € S^»{AB). Then, the following two optimization problems satisfy strong 
duality and both evaluate to exp(//max(^|fi)p)- 


primal problem 


dual problem 


minimize : p 

subject to : p/^ > Tr^(Z^s) 
Sab Slc> Pabc 
Sab > 0 , p > 0 


maximize : Tt{pabcYabc) 
subject to ; Trc(TABc) <hSOB 
Tt{gb) < 1 
Tabc > 0 , Os > 0. 


(6.14) 


Proof The dual problem has a finite solution, Tt{Yabc) < dA, and hence the maximum cannot 
exceed c/a - There are also primal feasible points with Zab Sic > Pabc and Ib > Sb- □ 

The primal problem can be rewritten by noting that the optimization over p corresponds to 
evaluating the operator norm of Zb- 


£^max(A|B)p =logmin| ||Zb||„ iZabO/c >Pabc,Zab G (6.15) 


To arrive at this SDP we introduced a purification of Pab. and consequently (6.15) depends on 
Pabc as well. This can be avoided by choosing a different SDP for the fidelity. 
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Lemma 6.3. For all Pab S y,{AB), we have 


exp(//max( 2 l|B)p) = inf Tr(pAB7^B')||7B||«,. 

Yab>^ 

(6.16) 


This can be interpreted as the Alberti form [1] of the max-entropy. Its proof is based on an SDP 
formulation of the fidelity due to Watrous [172] and Killoran [98]. 

Proof. From [98,172] we learn that \/F {PabJa® (^b) equals the dual problem of 

the following SDP; 


primal problem 

minimize ; Ix^PabYab) + 7 
subject to ; jIb > TrA(i 22 ) 
^Yn 0 
0 722 


>5 


0 / 
I 0 


7ii >0,722 >0,7>0 


dual problem 

maximize : j(Tr2fi2 -I-Tr2f2i) 
subject to : 1 < Pab 

X 22 <iA^(yB 

Tr((7B) < 1 

fXn Xn 
U21 ^22 


> 0 , Ob > 0. 


(6.17) 


Strong duality holds. The primal program can be simplified by noting that 


7ii 0 
0 722 


> 


0 / 
I 0 


holds if and only if ^/YfiYu^/Yfi > I- This allows us to simplify the primal problem and we find 


max y/F{pAB,lA<^(yB) 

OBe.y{B) 




(6.18) 


Now, by the arithmetic geometric mean inequality, we have 

^-Tr{pABY^B') + \\\YB\\o. > V^Tr(p^B7^-;)||7B|U = ^Tr(p^B(c7^B)-') + ^||c7b||oc (6.19) 

> inf iTr(pAB7/;) + ^||7 b||oc. (6.20) 

Here, c is chosen such that iTr(pAB7^*) = c||7b||oo, such that the arithmetic geometric mean 
inequality becomes an equality. Therefore we have 


max y/F{pAB,lA<S>(JB) 

<yBey(B) 


inf v/Tr(p^B7/;)||7B|| 


( 6 . 21 ) 


and the desired equality follows. □ 

This can be used to prove upper bounds on the max-entropy. For example, the quantity 
Hq{A\B)p —which is sometimes used instead of the max-entropy [139] —is an upper bound 
on p ■ 


hI{A\B)p — log max Tr ({Pab > 0}/a ( 8 ) ( 7 b ) ^ ^max {A\B)p. 
agey.iB) 


( 6 . 22 ) 
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This follows from Lemma 6.3 by the choice Y^b = {Pab > 0} + sIab with £ —> 0, which yields 
the projector onto the support of p^B- Furthermore, we have 

II Tr^ ({Pab > 0}) 11^ = max Tr ({pab > 0}/a 0 Cb) . (6.23) 

o'bG.S^.{B) 


Min- and Max-Entropy Duality 

Finally, the max-entropy can be expressed as a min-entropy of the purified state using the duality 
relation in Proposition 5.3, which for this special case was first established by Konig et al. [101]. 

Lemma 6.4. Let p e S^,{ABC) be pure. Then, //max(^|fi)p = —^min(^|C)p. 

Proof. We have already seen in Proposition 5.3 that this relation holds for normalized states. The 
lemma thus follows from the observation that 

Hmm{A\B)p=H,nm{A\B)p-\ogt, and //max(A|B)p = i/min(A|B)p+logf (6.24) 

for any Pab £ =5^. {AB) and Pab £ .5^0 {AB) with Pab =tpAB- □ 


6.1.4 Classical Information and Guessing Probability 


First, let us specialize some of the results in Proposition 5.1 to the min- and max-entropy. In the 
limit a —>^ oo and at a = 5 ^ we find that 


^mm(A|BT)p =-log ^^p(y)exp(|-//min(A|B)p(y))^, and (6.25) 




(A|BT)p = log ^^p(y)exp (//niax(A|B)p(3,) 


(6.26) 


Guessing Probability 

The classical min-entropy //min(2f |T)p can be interpreted as a guessing probability. Consider an 
observer with access to Y. What is the probability that this observer guesses X correctly, using 
his optimal strategy? The optimal strategy of the observer is clearly to guess that the event with 
the highest probability (conditioned on his observation) will occur. As before, we denote the 
probability distribution of x conditioned on a fixed y by p(x|y). Then, the guessing probability 
(averaged over the random variable Y) is given by 

^p(y) maxp(x|y) =exp(-//n,in(X|T)p). 


(6.27) 
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It was shown by Konig et. al. [101] that this interpretation of the min-entropy extends to the 
case where Y is replaced by a quantum system B and the allowed strategies include arbitrary 
measurements of B. 

Consider a classical-quantum state pxB = Lr k)(-^l C) Pb{x). For states of this form, the min- 
entropy simplihes to 


exp(-//n,in(X|B)p) 


max 

£eCPTP(B,X') 


•F 




•F 


XX' 


max 

eGCPTP(B.X') 


T.{A^iPB{x))\x)x, ■ 


(6.28) 

(6.29) 


The latter expression clearly reaches its maximum when £ has classical output in the basis 
{ \x)x' OJ" ill other words, when £ is a measurement map of the form £ : i—> Y.y Tr(pBMy) |y)(y | 
for a POVM {My}y. We can thus equivalently write 

exp(-//min(2f|F)p) = max ^Tr(M,.pB(y)). (6.30) 

{My}y a POVM “ 


Moreover, let {My} be a measurement that achieves the maximum in the above expression and 
define i{x,y) = Yr{MyPB{x)) as the probability that the true value is .x and the observer’s guess is 
y. Then, 


exp(-//™n(X|F)p) =^Tr(M,,pB(y)) (6.31) 

< ^maxTr(M,.pB(x)) = exp (|T)t) , (6.32) 

y 

and this is in fact an equality by the data-processing inequality. Thus, it is evident that //min(2f |F)p = 
|T)t can be achieved by a measurement on B. 


6.2 Smooth Entropies 

The smooth entropies of a state p are defined as optimizations over the min- and max-entropies 
of states p that are close to p in purified distance. Here, we define the purified distance and the 
smooth min- and max-entropies and explore some properties of the smoothing. 


6.2.1 Definition of the e-Ball 

We introduce sets of e-close states that will be used to define the smooth entropies. 


Definition 6.4. Let p € .5^. (A) and 0 < e < \/Tr(p). We define the e-ball of states in 
.5^,(A) around p as 






90 


6 Smooth Entropy Calculus 


3§‘^{A-p) :={tg^.(A) :P(t,p) <£}. (6.33) 

Furthermore, we define the e-ball of pure states around p as 3§l{A-,p) := {t G ^^{A-,p) : 
rank(T) = 1}. 


For the remainder of this chapter, we will assume that e is sufficiently small so that e < VTFp 
is always satisfied. Furthermore, if it is clear from the context which system is meant, we will 
omit it and simply use the notation ^(p). We now list some properties of this e-ball, in addition 
to the properties of the underlying purified distance metric. 

i. The set ^®(A;p) is compact and convex. 

ii. The ball grows monotonically in the smoothing parameter e, namely e < e' ^®(A;p) c 

{A-,p). Furthermore, 0^{A-,p) = {p}. 


6.2.2 Definition of Smooth Entropies 

The smooth entropies are now defined as follows. 


Definition 6.5. Let Pab G .5^, (AB) and e > 0. Then, we define the e-smooth min- and 
max-entropy of A conditioned on B of the state Pab as 


^min(^l^)p := . max H^iniA\B)p and 

Pab'=^ (Pab) 


HLAmp ■= 


mm 


PABe.^''(PAB) 




(6.34) 

(6.35) 


Note that the extrema can be achieved due to the compactness of the e-ball (cf. Property i.). We 
usually use p to denote the state that achieves the extremum. Moreover, the smooth min-entropy 
is monotonically increasing in e and the smooth max-entropy is monotonically decreasing in e 
(cf. Property ii.). Furthermore, 

//«.„(A|B)p „(A|B)p and 

— ^max (A|B)p. (6.36) 

If Pab is normalized, the optimization problems defining the smooth min- and max-entropies 
can be formulated as SDPs. To see this, note that the restrictions on the smoothed state p are 
linear in the purification Pabc of Pab- In particular, consider the condition P{p,p) < e on p, or, 
equivalently, fAp,P) > 1 — e^. If Pabc is normalized, then the squared fidelity can be expressed 
as F}{p,p)^ Tr Pabc Pabc- 

We give the primal of the SDP for exp(—(A \B)p) as an example. This SDP is parametrized 
by an (arbitrary) purification Pabc € -9^o{ABC). 





6.2 Smooth Entropies 


91 


primal problem 
minimize : Tr(oB) 

subject to : /a 0 C7b > Trc {Pabc) (6.37) 

Tr(pABc) < 1 
Tr(pABcPABc) > 1 — 

Pabc £ ^(ABC), Ob £ ^(B) 

This program allows us to efficiently compute the smooth min-entropy as long as the involved 
Hilbert space dimensions are small. 


6.2.3 Remarks on Smoothing 

For both the smooth min- and max-entropy, we can restrict the optimization in Definition 6.5 to 
states in the support of Pa ® Pb- 

Proposition 6.1. Let Pab £ S^,{AB) and 0 < £ < \/Tr(pAB). Then, there exist respective states 
pAB £ ^^(Pab) in the support of Pa ® Pb such that 

//^i„(A|B)p=//^in(A|B)p or //^,^(A|B)p =//„,ax(A|B)p . (6.38) 

Proof. Let Pabc be any purification of Pab- Moreover, let TIab = {Pa > 0} 0 {Pb > 0} be the 
projector onto the support of Pa < 8 ) Pb- 

For the min-entropy, first consider any state p^g G ^^{Pab) that achieves the maximum in 
Definition 6.5. Then, there exists a (7^ G S^o{B) with ^min(A|B)p = —logTr((J^) such that 

Pab <Ia<S)CTb UabP' ab^ab < {Pa > 0 } ® (pb > 0}cTb{Pb > 0} ■ (6-39) 

='PAB 

Moreover, Pab £ ^^{Pab) since the purified distance contracts under trace non-increasing maps, 
and Tr((7B) < Tr((7g). We conclude that Pab must be optimal. 

For the max-entropy, again we start with any state p^^ G ^M^{pab) that achieves the maximum 


in Definition 6.5. Then, using Pab as defined above 

max F{Pab,Ia^CTb) = max F inABpAB^ABjA® <y'B) (6-40) 

c'^eydB) ageyoiB) 

= max F'(pab,{Pa>0}(8){pb>0}(Jb{pb>0}) (6.41) 

ageyoiB) 

< max F{p'j^B,lA®(yB) ■ (6.42) 

ag&y,(B) 

Hence, F/max(A|B)p < Hmax(A|B)p', concluding the proof. □ 


Note that these optimal states are not necessarily normalized. In fact, it is in general not possible 
to find a normalized state in the support of Pa ® Pb that achieves the optimum. Allowing sub¬ 
normalized states, we avoid this problem and as a consequence the smooth entropies are invariant 
under embeddings into a larger space. 
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Corollary 6.1. For any state Pab € S^,{AB) and isometries U :A—t A’ and V \ B ^ B', we 
have 

H^^^{A\B)p=HF^[A'\B'),, HU{A\B)p=HI,,{A!\B'), (6.43) 

where Ta'b' = {U ®V)Pab{U ®y)t 


On the other hand, if p is normalized, we can always find normalized optimal states if we em¬ 
bed the systems A and B into large enough Hilbert spaces that allow smoothing outside the support 
of Pa® Pb- For the min-entropy, this is intuitively true since adding weight in a space orthogonal 
to A, if sufficiently diluted, will neither affect the min-entropy nor the purified distance. 

Lemma 6.5. There exists an embedding from A to A' and a normalized state Pa'b G ^^{Pa'b) 
such that H^i^{A'\B)p =HF^{A\B)p. 

Proof. Let {PaBiC^b} be such that they maximize the smooth min-entropy X — ^min(^l^)p’ 
we have Pab < exp(—A)Li ® (Jg. Then we embed A into an auxiliary system A' with dimension 
dA +dA to be defined below. The state p^/g = Pab © (1 — Tr(p))7i:^ ® Og, satisfies 

Pa'b = Pab © (1 - Tr(p)) ny^®aB< exp(-l)(/A ©4) © Og (6.44) 

if exp(A)(l —Tr(p)) < exp(A) < d^. Hence, if d^ is chosen large enough, we have F[mm{A'\B)p > 
A. Moreover, F.t{p,p) = 4(p,p) is not affected by adding the orthogonal subspace. □ 

For the max-entropy, a similar statement can be derived using the duality of the smooth en¬ 
tropies. 


Smoothing Classical States 

Finally, smoothing respects the structure of the state p, in particular if some subsystems are 
classical then the optimal state p will also be classical on these systems. 

Lemma 6.6. For both HF^{AX\BY)p and H^^^{AX\BY)p, there exist an optimizer Paxby G 
{Paxby) that is classical on X and Y. 

Proof Consider the pinching maps CPx(’) = L.r k)(-*l' J’f defined analogously. Since 

these are CPTP and unital, we immediately find that HF^{AX\BY)pi < HFJfYX\BY)p for any 
state Paxby Paxby = ©x © ©f(Paxbf) desired form. Since Paxby is invariant under 

this pinching, the state p lies in ^^{p) if p' lies in the ball. Hence, p must be optimal. 

For the max-entropy, we follow the argument in the proof of the previous lemma, leveraging 
on Lemma 3.3. Using the state p from above, this yields 

max F ( Paxby, lAX®(y by) = tnax F’(p^;fgy,©x(4x) ©©F(o-gy)) (6.45) 

a'gyeyo{B) a'gyeyo{B) 

< max F(Paxby,^A x ® ctby) ■ (6.46) 

OgY€^o{B) 


Hence,//nyax(AX|5F)p < HjY,iix{AX\BY)pi, concluding the proof. 


□ 
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6.3 Properties of the Smooth Entropies 

The smooth entropies inherit many properties of the respective underlying unsmoothed Renyi 
entropies, including data-processing inequalities, duality relations and chain rules. 


6.3.1 Duality Relation and Beyond 

The duality relation in Lemma 6.4 extends to smooth entropies. 

Proposition 6.2. Let p G 5^,{ABC) be pure and 0 < e < ^Tr(p). Then, 

= (6.47) 


Proof. According to Corollary 6.1, the smooth entropies are invariant under embeddings, and we 
can thus assume without loss of generality that the spaces B and C are large enough to entertain 
purifications of the optimal smoothed states, which are in the support of Pa ® Ps and Pa 0 Pc. 
respectively. Let Pab be optimal for the max-entropy, then 

HUimp — ^max(^|^)p ^ min ^max(^|^)p (6.4-8) 

P^MiPABc) 

= min -//min(A|C)p > _ min -H„,iniA\C)p =-H^^^{A\C)p . (6.49) 

P&^1{Pabc) (Pac) 

And, using the same argument starting with HP^{A\C)p, we can show the opposite inequality. 

□ 

Due to the monotonicity in a of the Renyi entropies the min-entropy cannot exceed the max- 
entropy for normalized states. This result extends to smooth entropies [116,169]. 

Proposition 6.3. Let p G S^o{AB) and (p, > 0 such that (p + B < |. Then, 

(A|a)p . (6.50) 

Proof. Set e = sin((p). According to Lemma 6.5, there exists an embedding A' of A and a 
normalized state Pa'b S S§^{Pa'b) such that //min(A'|B)p = ^min(^l^)p- particular, there ex¬ 
ists a state Ob G S^o{B) such that Pa'b < exp(—A)/^/ ® Gb with X — 7/^ij,(A|Z?)p. Thus, letting 
Pa'b S (Pa'b) be a state that minimizes the smooth max-entropy, we find 

^max(^l^)p — tLmax{A \B)p > (Pa'B Og) (6.51) 

> A —Di/2(Pa'bI!Pa'b) = ■^+log (l ~ A’(Pa'BjPa'b)^) (6.52) 

> //,^i„(A|B)p +log (1 - sin((p -f P)2). (6.53) 

In the final step we used the triangle inequality in (3.59) to find P{Pa'B:Pa'b) If sin((p -1- B). □ 
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Proposition 6.3 implies that smoothing states that have similar min- and max-entropies has 
almost no effect. In particular, let C S^o(AB) with //min(^|5)p = ^max(-A|B)p. Then, 

< //„,ax(A|B)p -log(l -£2) =//^i„(A|B)p -log(l -£2). (6.54) 

This inequality is tight and the smoothed state p = (1 —S^)p reaches equality. An analogous 
relation can be derived for the smooth max-entropy. 


6.3.2 Chain Rules 

Similar to the conditional Renyi entropies, we also provide a collection of inequalities that replace 
the chain rule of the von Neumann entropy. These chain rules are different in that they introduce 
an additional correction term in C>(logi) that does not appear in the results of the previous 
chapter. 


Theorem 6.1. Let p G (ABC) and £,£',£" G [0,1) with e> e' + 2e" . Then, 

HU(AB\C)p > H<MBC)p 

H<,(AB\C)p < HU(A\BC)p +H^^UB\C)p+2g(5), 
H<,(AB\C)p < Hf;UA\BC)p +HU{B\C)p+2>g(5), 

where g{5) = —log (l — Vl — ^2) and 5 = e — e' — 2e". 

(6.55) 

(6.56) 

(6.57) 

See [169] for a proof. Using the duality relation for smooth entropies on (6.55), (6.56) and 

(6.57), we also find the chain rules 


HUiAB\C)p < H^UA\BC)p +H^UB\C)p +g(5), 

(6.58) 

H^UAB\C)p > //,l(A|BC)p +HUiB\C)p-2g(5), 

(6.59) 

//l,(AB|C)p > HU(A\BC)p +H^JB\C)p - 3g(5 ). 

(6.60) 


Classical Information 

Sometimes the following alternative bounds restricted to classical information are very useful. 
The first result asserts that the entropy of a classical register is always non-negative and bounds 
how much entropy it can contain. 

Lemma 6.7. Let e G [0,1) and p G 5^,{XAB) be classical on X. Then, 

//®,„(A|B)p < H^^,{XA\B)p < H^^,{A\B)p+\ogdx and (6.61) 

HL. {A\B)p < HU iXA\B)p < HU (A\B)p + log dx . (6.62) 

We are also concerned with the maximum amount of information a classical register X can contain 
about a quantum state A. 
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Lemma 6.8. Let £ G [0,1) and p G S^,{AYB) be classical on Y. Then, 

- \ogdY and (6.63) 

> //Lx(A|B)p - ^ogdy. (6.64) 

We omit the proofs of the above statements, but note that they can be derived from (5.94) together 
with the fact that the states achieving the optimum for the smooth entropies retain the classical- 
quantum structure (cf. Lemma 6.6). 


6.3.3 Data-Processing Inequalities 

We expect measures of uncertainty of the system A given side information B to be non-decreasing 
under local physical operations (e.g. measurements or unitary evolutions) applied to the B system. 
Furthermore, in analogy to the conditional Renyi entropies, we expect that the uncertainty of the 
system A does not decrease when a sub-unital map is executed on the A system. 


Theorem 6.2. Let Pab £ ,9’,{AB) and0<e< s/Tr{p). Moreover, let £ G CPTP(A,A') be 
sub-unital, and let IF G CPTP(B,B'). Then, the state Xa'b' = (£ ® 1F)(Pab) satisfies 

//,^i„(A|B)p<//L„(A'|B')r and H^,,iA\B)p < H^,,iA'\B'), . (6.65) 


Proof. The data-processing inequality for the min-entropy follows from the respective property 
of the unsmoothed conditional Renyi entropy. We have 


//^i„(A|B)p = Hi{A\B)p < Hi{A'\B')i < H^^,{A'\B%. (6.66) 

Here, Pab is a state maximizing the smooth min-entropy and Tab — (£03^)(Pab) lies in ^^(Ta'b')- 
To prove the result for the max-entropy, we take advantage of the Stinespring dilation of £ 
and IF. Namely, we introduce the isometries [/ : AA'A" and V : BB'B" and the state a'a'b'b" = 
ifj ®V)pab{U"^ of which a'b’ is a marginal. Let f G ^^{a'a"b'b") be the state that minimizes 

the smooth max-entropy H^^j^{A'\B’)t:. Then, 

max JogF^{TA>B'fiA'<»(yB') (6.67) 

> max logF^ffvR'jTr^ffLL/v/^ffD/). (6.68) 

ag,eyo{B') ^ ' 

We introduced the projector LIa'a" = UU^ onto the image of U, which exhibits the following 
property due to the fact that £ is sub-unital; 

Tr/i«(Lf4'A") = Tr,4// (UIaU^) = £(7a) < Ia' ■ (6.69) 

The inequality in (6.68) is then a result of the fact that the fidelity is non-increasing when an 
argument A is replaced by a smaller argument B < A. Next, we use the monotonicity of the 
fidelity under partial trace to bound (6.68) further. 
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max (6-70) 

ag,g„eyo{B'B') 

= max iog[11^1 A"Ta'A"B'B"^A'A" Ja'A" ^(^B'B") (6-71) 

ag,g,ieyo{B'B') 

= //max(A'A"|B'B")f. (6.72) 

Finally, we note that Ta'A"b'b" = FIa'a"^a'A"b'b"FIa'a" G ^^{Pa'A"b'b") due to the monotonicity 
of the purified distance under trace non-increasing maps. Hence, we established > 

H^^{A'A"\B'B")t: = H^^^{A\B)p, where the last equality follows due to the invariance of the 
max-entropy under local isometries. □ 


Functions on Classical Registers 

Let us now consider a state pxAB that is classical on X. We aim to show that applying a classical 
function on the register X cannot increase the smooth entropies AX given B, even if this operation 
is not necessarily sub-unital. In particular, for the min-entropy this corresponds to the intuitive 
statement that it is always at least as hard to guess the input of a function than it is to guess its 
output. 


Proposition 6.4. Let Pxab = Y^xPx ® Pab{x,) be classical on X. Furthermore, let 

£ G [0,1) and let f : XZ be a function. Then, the state Xzab = Y^xPx l/(■^))(/(■*)lz®PAB(Jc) 
satisfies 

HF^iZA\B),<HF^{XA\B)p and H^,,{ZA\B), < H^,,{XA\B)p . (6.73) 


Proof. A possible Stinespring dilation of / is given by the isometry U : \x)x >■ \x)x' ® \ f{x))z 
followed by a partial trace over X' . Applying U on Pxab, we get 

'^X'ZAB ■= UPxabU^ = YjP^ \x){x\x' ®\f{x)){f{x)\Y®PAB{x) (6.74) 

which is classical on X' and Z and an extension of ZAB. Hence, the invariance under isometries 
of the smooth entropies (cf. Corollary 6.1) in conjunction with Proposition 6.8 implies 

HF,[XA\B)p = H^^^{X'ZA\B), > . (6.75) 

An analogous argument applies for the smooth max-entropy. □ 


6.4 Fully Quantum Asymptotic Equipartition Property 

Smooth entropies give rise to an entropic (and fully quantum) version of the asymptotic equipar¬ 
tition property (AEP), which states that both the (regularized) smooth min- and max-entropies 
converge to the conditional von Neumann entropy for iid product states. The classical special 
case of this, which is usually not expressed in terms of entropies (see, e.g., [38]), is a workhorse 
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of classical information theory and similarly the quantum AEP has already found many applica¬ 
tions. 

The entropic form of the AEP explains the crucial role of the von Neumann entropy to de¬ 
scribe information theoretic tasks. While operational quantities in information theory (such as 
the amount of extractable randomness, the minimal length of compressed data and channel ca¬ 
pacities) can naturally be expressed in terms of smooth entropies in the one-shot setting, the von 
Neumann entropy is recovered if we consider a large number of independent repetitions of the 
task. 

Moreover, the entropic approach to asymptotic equipartition lends itself to a generalization 
to the quantum setting. Note that the traditional approach, which considers the AEP as a state¬ 
ment about (conditional) probabilities, does not have a natural quantum generalization due to the 
fact that we do not know a suitable generalization of conditional probabilities to quantum side 
information. Eigure 6.1 visualizes the intuitive idea behind the entropic AEP. 



0.0 0.25 0.5 0.75 1.0 


Fig. 6.1 Emergence of Typical Set. We consider n independent Bernoulli trials with p = 0.2 and denote the prob¬ 
ability that an event x” (a bit string of length n) occurs by {x "). The plot shows the suprisal rate, — i log P„ (x"), 
over the cumulated probability of the events sorted such that events with high surprisal are on the left. The curves 
for n = {50,100,500,2500} converge to the von Neumann entropy, H{X) Ri 0.72 as n increases. This indicates 
that, for large n, most (in probability) events are close to typical (i.e. they have surprisal rate close to H(X)). 

The min-entropy, //niin(2f) ~ 0.32, constitutes the minimum of the curves while the max-entropy, H^^{X) « 0.85, 
is upper bounded by their maximum. Moreover, the respective e-smooth entropies, n^max(^")’ 

can be approximately obtained by cutting off a probability e from each side of the x-axis and taking the minima 
or maxima of the remaining curve. Clearly, the e-smooth entropies converge to the von Neumann entropy as n 
increases. 


6.4.1 Lower Bounds on the Smooth Min-Entropy 

For the sake of generality, we state our results here in terms of the smooth relative max-divergence, 
which we define for any p G ^t(A) and a G {A) as 
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^Lx(Pl|c^):=_mn D^iPWa). 

peSg<^(p) 

The following gives an upper bound on the smooth relative max-entropy [45,155]. 


(6.76) 


Lemma 6.9. Let p € =5^,(A),(7 e and X € (— “o, Dn,ax(p||<7)]. Then, 

^here e = ^2Tx{L)-T x{LY (6.77) 

and E = {p > exp(A)(7}(p — exp(/l)(7), i.e. the positive part of p — exp(A)(7. 


The proof constructs a smoothed state p that reduces the smooth relative max-divergence rel¬ 
ative to (7 by removing the subspace where p exceeds exp(A)(7. 

Proof. We first choose p, bound and then show that p G ^^{p). We use the abbre¬ 

viated notation A : = exp(A)(7 and set 

p:=GpG^ where G := +E)-'/\ (6.78) 

where we use the generalized inverse. From the definition of Z, we have p <A+E', hence, p < A 
and Z)inax(p||c^) < X. 

Let Ip) be a purification of p, then (G®/) |p) is a purification of p and, using Uhlmann’s 
theorem, we find a bound on the (generalized) fidelity: 

v/T;(p,p)>|(p|G|p)| + ^/(l-Tr(p))(l-Tr(p)) (6.79) 

>9^(Tr(Gp)) + l-Tr(p) = l-Tr((/-G)p), (6.80) 

where we introduced G = j(G-f G^) and 91 denotes the real part. This can be simplified further 
by noting that G is a contraction. To see this, we multiply A < A + E with (A from left 

and right to get 

G'^G= (A-fi;)-‘/"A(A-fi;)-‘/2 </. (6.81) 

Furthermore, G< I, since ||G|| < 1 by the triangle inequality and ||G|| = || G^ll< 1. Moreover, 

Tr((/-G)p) <Tr(A-fi;)-Tr(G(A-|-i;)) (6.82) 

= Tr(A-t-i;)-Tr((A-f2;)‘/"A'/") <Tr(i;), (6.83) 

where we used p < A + E and y/A -fZ > s/A. The latter inequality follows from the operator 
monotonicity of the square root function. Finally, using the above bounds, the purified distance 
between p and p is bounded by 

P{p,p) = ^1 -F;(p,p)) <^l- {l-Ti{E)f = ^2TT{E)-Ti{E)f (6.84) 

Hence, we verified that p G ^^{p), which concludes the proof. 
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In particular, this means that for a fixed EG [0,1) and p <7, we can always find a finite X 
such that Lemma 6.9 holds. To see this, note that e(A) = y^2dT(Zy^^Tr(T)2 is continuous in X 

with e(I>max(p||o’)) = 0 and lim;L^_„„e(A) = 1. 

Our main tool for proving the fully quantum AEP is a family of inequalities that relate the 
smooth max-divergence to quantum Renyi divergences for a G (1,°°)- 

Proposition 6.5. Let p G S^o{A),a G oS^(A), 0 < e < 1 and a G (1,°°)- Then, 

£>max(p||(^) <Da(p||(7) + |^, (6.85) 

where g(e) = —log (l — v/l — e^) and Da is any quantum Renyi divergence. 


Proof. If p (J the bound holds trivially, so for the following we have p ^ a. Furthermore, 
since the divergences are invariant under isometries we can assume that CJ > 0 is invertible. 

We then choose X such that Lemma 6.9 holds for the e specified above. Next, we introduce the 
operator X = p — exp(A )a with eigenbasis {|e,) },g 5 . The set CS contains the indices i corre¬ 
sponding to positive eigenvalues of X. Hence, {X > 0}A'{A' > 0} = F as defined in Lemma 6.9. 
Furthermore, let r,- = (e, |p| ei)> 0 and Si = (e, |(7|e,) > 0. It follows that 


Vi G 5^ : r,' — exp(A)s,' > 0 and, thus, — exp(—A) > 1. 


For any a G (1,°°), we bound Tr(F) = 1 — as follows: 


1 - = Tr(Z) = Y^rt- exp(A)s; < Y T' 

ieS+ ieS+ 

- L <exp(-A(a-l))^r“i’^“. 


ies+ 


ies 


Hence, taking the logarithm and dividing by a — 1 > 0, we get 


X < 


1 


a- 


^ V ics 




1 


log 


a-1 1-Vl^i 


( 6 . 86 ) 


(6.87) 

( 6 . 88 ) 


(6.89) 


Next, we use the data-processing inequality of the Renyi divergences. We use the measurement 
CPTP map M : A i-> Y.ies \^i){^i\Xi\ei){ei\ to obtain 

D„(p||(t) >Da(M(p)|lM((T)) = ^log(Yrrs]-A . (6.90) 

We conclude the proof by substituting this into (6.89) and applying Lemma 6.9. □ 

We also note here that g{e) can be bounded by simpler expressions. For example, 1 — 
using a second order Taylor expansion of the expression around e = 0 and the fact 
that the third derivative is non-negative. This is a very good approximation for small £. Hence, 
(6.85) can be simplified to [155] 
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£>Lx(p||(^) <D«(p||cj) + ^^log^. (6.91) 

Proposition 6.5 is of particular interest when applied to the smooth conditional min-entropy. 
In this case, let Pab G ^»{AB) and Ob be of the form Ia 0 Ob- Then, for any a G (1,°°), we have 

> MaiA\B)p - , (6.92) 

where we again take Ha to be any conditional Renyi entropy whose underlying divergence sat¬ 
isfies the data-processing inequality. The duality relation for the smooth min- and max-entropies 
(cf. Proposition 6.2) and the Renyi entropies (cf. Sec. 5.3) yield a corresponding dual relation for 
the max-entropy. 


6.4.2 The Asymptotic Equipartition Property 

In this section we now apply Proposition 6.5 to two sequences {p"}„ and of product states 

of the form 


n 


n 


P" = 0 Pi, = 0 o-;, with Pi, (7, G ,5^0(A) (6.93) 

1=1 (=1 

where we assume for mathematical simplicity that the marginal states p, and O’,- are taken from a 
finite subset of S^o(A). Proposition 6.5 then yields 




(6.94) 


We can further bound the smooth max-divergence in Proposition 6.5 using the Taylor series 
expansion for the Renyi divergence in (4. 102). This means that there exists a constant C such that, 
for all a G (1,2] and all p, and ff,-, we have* 


DaiPiWoi) < D{pi\\ai) + {a- l)i^^^y(p,||(7,) + {a-l)^C, (6.95) 

It is often not necessary to specify the constant C in the above expression. However, it is possible 
to give explicit bounds, which is done, for example, in [155]. Substituting the above into (6.94) 
and setting a = 1 + ^ yields 


-DLxipV) <-iD{pi\\a,) + -^(gie) + ^-^^^-f^V{pi\\a,)) + -. (6.96) 

n V” \ 2 S / ” 

Hence, in particular for the iid case where pi = p and C; = a for all i, we find; 


* Here we use that p; and CT; are taken from a finite set, so that we can choose C uniformly. 
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Theorem 6.3. Let p £ J^o{A) and a G o.nd e G (0,1). Then, 
«->“ \^n ) 


(6.97) 


This is the main ingredient of our proof of the AEP below. 


Direct Part 

In this section, we are mostly interested in the application of Thm. 6.3 to conditional min- and 
max-entropies. Here, for any state Pab G 5^o{AB), we choose Oab =Ia® Pb- Clearly, 

> -D^UPabK) ( 6 - 98 ) 

Thus, by Thm. 6.3, we have 

hm ( j > lim ( - (pfsl O j (6-99) 

= -D{pab\\oab)=H{A\B)p. (6.100) 

This and the dual of this relation leads to the following corollary, which is the direct part of 
the AEP. 


Corollary 6.2. Let Pab C S^o{AB) and 0 < e < 1. Then, the smooth entropies of the Ltd. 

product state Pa^b" = Pab 



lim < 

n^oo 


^>H{A\B)p and 

(6.101) 

lim \ 


\<H{A\B)p. 

(6.102) 


Converse Part 

To prove asymptotic convergence, we will also need converse bounds. Eor e = 0, the converse 
bounds are a consequence of the monotonicity of the conditional Renyi entropies in a, i.e. 
77min(A|B)p < H[A\B)p < //niax(A|B)p for normalized states Pab G S^o{AB). Eor e > 0, simi¬ 
lar bounds can be derived based on the continuity of the conditional von Neumann entropy in 
the state [2]. However, such bounds do not allow a statement of the form of Corollary 6.2 as the 
deviation from the von Neumann entropy scales as nf{e), where f{e) —^ 0 only for e —>^ 0. (See, 
for example, [155] for such a weak converse bound.) This is not sufficient for some applications 
of the asymptotic equipartition property. 

Here, we prove a tighter bound, which relies on the bound between smooth max-entropy and 
smooth min-entropy established in Proposition 6.3. Employing this in conjunction with (6.101) 
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and (6.102) establishes the converse AEP bounds. Let 0 < e < 1. Then, using any smoothing 
parameter 0 < e' < 1 — e, we bound 

+ ilog^-^^i^ . (6.103) 


The corresponding statement for the smooth max-entropy follows analogously. Starting from (6.103) 
we then apply the same argument that led to Corollary 6.2 in order to establish the following con¬ 
verse part of the AEP. 


Corollary 6.3. Let Pab C S^o{AB) and 0 < e < 1. Then, the smooth entropies of the i.i.d. 

product state Pa^b^ = Pab 


lim j < H{A\B)p and 

n^oo n J 

(6.104) 

lim|-//l,(A"|B")pj>//(A|B)p. 

(6.105) 

n^oo ^ J 



These converse bounds are particularly important to bound the smooth entropies for large 
smoothing parameters. In this form, the AEP implies strong converse statements for many infor¬ 
mation theoretic tasks that can be characterized by smooth entropies in the one-shot setting. 


Second Order 

It is in fact possible to derive more refined bounds here, in analogy with the second-order refine¬ 
ment for Stein’s lemma encountered in Sec. 7.1. Eirst we note that from the above arguments we 
can deduce that the second-order term scales as 

£>Lx(P®iO =«'D(p||(T)-bO(v^). (6.106) 

and thus it suggests itselfs to try to find an exact expression for the 0{^/n) term.^ One finds that 
the second-order expansion of T>max(P*"ll‘^*") given as [157] 

^Lx(P®"||c^®") =nD{p\\a)-s/nV{p\\a)^-\e^) + 0{\Qgn), (6.107) 

where (p is the cumulative (normal) Gaussian distribution function. A more detailed discussion 
of this is outside the scope of this book and we defer to [157] instead. 


6.5 Background and Further Reading 

This chapter is largely based on [152, Chap. 4—5]. The exposition here is more condensed com¬ 
pared to [152]. On the other hand, some results are revisited and generalized in light of a better 
understanding of the underlying conditional Renyi entropies. 

^ Analytic Bounds on the second-order term were also investigated in [1 1], 
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The origins of the smooth entropy calculus can be found in classical cryptography, for example 
the work of Cachin [32]. Renner and Wolf [141] first introduced the classical special case of the 
formalism used in this book. The formalism was then generalized to the quantum setting by Ren¬ 
ner and Konig [140] in order to investigate randomness extraction against quantum adversaries 
in cryptography [99]. Based on this initial work, Renner [139] then defined conditional smooth 
entropies in the quantum setting. He chose H^, as the min-entropy (as we do here as well) and 
he chose as the max entropy. Later Konig, Renner and Schaffner [101] discovered that 
naturally complements the min-entropy due to the duality relation between the two quantities. 
Consequently, the max-entropy is defined as in most recent work. (Notably, at the time the 
structure of conditional Renyi entropies as discussed in this book, in particular the duality rela¬ 
tion, was only known in special cases.) Moreover, Renner [139] initially used a metric based on 
the trace distance to define the £-ball of close states. However, in order for the duality relation 
to hold for smooth min- and max-entropies, it was later found that the purified distance [156] is 
more appropriate. 

The chain rules were derived by Vitanov et al. [168,169], based on preliminary results in [20, 
160]. The specialized chain rules for classical information in Lemmas 6.7 and 6.8 were partially 
developed in [138] and [176], and extended in [152]. 

A first achievability bound for the quantum AEP for the smooth min-entropy was established 
in Renner’s thesis [139]. However, the quantum AEP presented here is due to [155] and [152]; it 
is conceptually simpler and leads to tighter bounds as well as a strong converse statement. It is 
also noteworthy that a hallmark result of quantum information theory, the strong sub-additivity of 
the von Neumann entropy (5.6), can be derived from elementary principles using the AEP [14]. 

The smooth min-entropy of classical-quantum states has operational meaning in randomness 
extraction, as will be discussed in some detail in Section 7.3. Decoupling is a natural generaliza¬ 
tion of randomness extraction to the fully quantum setting (see Dupuis’ thesis [48] for a compre¬ 
hensive overview), and was initially studied in the context of state merging by Horodecki, Oppen- 
heim and Winter [90]. Decoupling theorems can also be expressed in the one-shot setting, where 
the (fully quantum) smooth min-entropy attains operational significance [19,49,150]. 

Smooth entropies have been used to characterize various information theoretic tasks in the one- 
shot setting, for example in [138] and [42-44]. The framework has also been used to investigate 
the relation between randomness extraction and data compression with side information [136]. 
Smooth entropies have also found various applications in quantum thermodynamics, for example 
they are used to derive a thermodynamical interpretation of negative conditional entropy [47]. 

We have restricted our attention to finite-dimensional quantum systems here, but it is worth 
noting that the definitions of the smooth min- and max-entropies can be extended without much 
trouble to the case where the side information is modeled by an infinite-dimensional Hilbert 
space [60] or a general von Neumann algebra [22]. Many of the properties discussed here extend 
to these strictly more general settings. However, general chain rules and an entropic asymptotic 
equipartition property are not yet established in the most general algebraic setting [22]. 



Chapter 7 

Selected Applications 


This chapter gives a taste of the applications of the mathematical toolbox discussed in this book, 
biased by the author’s own interests. 

The discussion of binary hypothesis testing is crucial because it provides an operational inter¬ 
pretation for the two quantum generalizations of the Renyi divergence we treated in this book. 
This belatedly motivates our specific choice. Entropic uncertainty relations provide a compelling 
application of conditional Renyi entropies and their properties, in particular the duality relation. 
Finally, smooth entropies were originally invented in the context of cryptography, and the Left¬ 
over Hashing Lemma reveals why this definition has proven so useful. 


7.1 Binary Quantum Hypothesis Testing 

As mentioned before, the Petz and the minimal quantum Renyi divergence both find operational 
significance in binary quantum hypothesis testing. We thus start by surveying binary hypothesis 
testing for quantum states. However, the proofs of the statements in this section are outside the 
scope of this book, and we will refer to the published primary literature instead. 

Let us consider the following binary hypothesis testing problem. Let p,(7 € ^o(A) be two 
states. The null-hypothesis is that a certain preparation procedure leaves system A in the state p, 
whereas the alternate hypothesis is that it leaves it in the state a. If this preparation is repeated 
independently n € N times, we consider the following two hypotheses. 

Null Hypothesis: The state of A" is p®". 

Alternate Hypothesis: The state of A" is a®". 

A hypothesis test for this setup is an event r„ e .^,(A”) that indicates that the null-hypothesis is 
correct. The error of the first kind, a„{T„), is defined as the probability that we wrongly conclude 
that the alternate hypothesis is correct even if the state is p®". It is given by 

a„(r„;p):=Tr(p®"(V-r„)). (7.1) 

Conversely, the error of the second kind, fin{Tn), is defined as the probability that we wrongly 
conclude that the null hypothesis is correct even if the state is a®”. It is given by 
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j3„(7;,;(j):=Tr((J®"7;,). 


7 Selected Applications 


(7.2) 


7.1.1 Chernoff Bound 


We now want to understand how these errors behave for large n if we choose on optimal test. Let 
us first minimize the average of these two errors (assuming equal priors) over all hypothesis tests, 
which leads us to the well known distinguishing advantage (cf. Section 3.2). 


1 

mm - 

T„e^.{A") 2 




1 + ^ min Tr(r„((J®"-p®")) 

2 2 T„ey.iA'') ^ ’ 

i(l-4(p®«,(T®")). 


(7.3) 


However, this expression is often not very useful in itself since we do not know how 
Zi(p®",C7®”) behaves as n gets large. This is answered by the quantum Chernoff bound which 
states that the expression in (7.3) drops exponentially fast in n (unless p = (J, of course). The 
exponent is given by the quantum Chernoff bound [10,127]: 


Theorem 7.1. Let p, <7 € =51,(A). Then, 


1 

lim — log min 

n Tne^.iA”) 



max -log a (p 11 (7). 

Q<s<\ 


(7.4) 


This gives a first operational interpretation of the Petz quantum Renyi divergence for a S (0,1). 

Note that the exponent on the right-hand side is negative and symmetric in p and a. The 
objective function is also strictly convex in s and hence the minimum is unique unless p — a. The 
negative exponent is also called the Chernoff distance between p and (7, defined as 

^ciP,<y)-=- min log 2, (p 11 a) = max {I - s)D,{p\\a). (7.5) 

0<5<1 0<5’<1 

In particular, we have ^c(p ,(y) < D{p\\a) since (1 — i) < 1 in (7.5). 


7.1.2 Stein’s Lemma 

In the Chernoff bound we treated the two kind of errors (of the first and second kind) symmet¬ 
rically, but this is not always desirable. Let us thus in the following consider sequences of tests 
{T„}n such that finijn, a) < e„ for some sequence of {e„}„ with e„ G [0,1]. We are then interested 
in the quantities 

<(e„;p,(7) :=min|a„(7;;(7): 7; G Aj3„(7;„p) < e„|. (7.6) 

Let us first consider the sequence e„ = exp(—nR). Quantum Stein’s lemma now tells us that 
D{p\\a) is a critical rate for R in the following sense [86,128]. 
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Theorem 7.2. Let p, <7 G ^o{A) with p a. Then, 


lim a*(exp(—n7?);p,(7) = 



ifR<D{p\\a) 
ifR>D{p\\a) ■ 


(7.7) 


This establishes the operational interpretation of Umegaki’s quantum relative entropy. In fact, 
the respective convergence to 0 and 1 is exponential in n, as we will see below. An alternative 
formulation of Stein’s lemma states that, for any e G (0,1), we have 


lim — log min 

n^OQ fl 





; Tn G A «„(?;,p) < e| =D(p||(7). 


(7.8) 


Second Order Refinements for Stein’s Lemma 


A natural question then is to investigate what happens if — loge„ « nD{p\\a) plus some small 
variation that grows slower than n. This is covered by the second order refinement of quantum 
Stein’s lemma [105,157]. 


Theorem 7.3. Let p, <7 G =5^o(A) with p a and r G M. Then, 


lim a*(exp(—«D( p||( 7) — •\/nr);p,(7) = ^> 


(v/V(p||c7) 


(7.9) 


where is the cumulative (normal) Gaussian distribution function. 


These works also consider a slightly different formulation of the problem in the spirit of (7.8), 
and establish that 

-logmin|]3„(7’„;ff); T„ G ,^.(A") A «„(?;,p) < e| 

= nD{p\\a) + ^nV{p\\a)<P^^ (e) + (9(logn). (7.10) 


7.1.3 Hoeffding Bound and Strong Converse Exponent 

Another refinement of quantum Stein’s lemma concerns the speed with which the convergence to 
zero occurs in (7.7) if /? < D(p||(7). The quantum Hoeffding bound shows that this convergence 
is exponentially fast in «, and reveals the optimal exponent [76,123]: 

Theorem 7.4. Let p,(7 G =5^o(A) and 0 < R < D(p\\(7). Then, 

\im-^\oga*{exp{-nR)-p,a)= sup —*(Z),.(p||(7)-/?) 1. (7.11) 

« SG(O.I) I ^ J 
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This yields a second operational interpretation of Petz’ quantum Renyi divergence. 

A similar investigation can be performed in the regime when R > D(p||ff), and this time we 
find that the convergence to one is exponentially fast in n. The strong converse exponent is given 
by [119]: 


Theorem 7.5. Let p^O € =5^o(A) with p <ti a andR > D{p\\a). Then, 


lim — 

n^oo 


-\og(l-a*{exp{-nRy,p,a)) =sup|^^^ — ?-(R-Di(p||(7)) i . (7.12) 

n \ ^ 5>i L J 


This establishes an operational interpretation of the minimal quantum Renyi divergence for 

a G ( 1 ,°°). 


7.2 Entropic Uncertainty Relations 

The uncertainty principle [81] is one of quantum physics’ most intriguing phenomena. Here we 
are concerned with preparation uncertainty, which states that an observer who has only access to 
classical memory cannot predict the outcomes of two incompatible measurements with certainty. 
Uncertainty is naturally expressed in terms of entropies, and in fact entropic uncertainty relations 
(URs) have found many applications in quantum information theory, specifically in quantum cryp¬ 
tography. 

Let us now formalize a first entropic UR. For this purpose, let and {|t?y)}v be two 

ONBs on a system A and Mx € CPTP(A,A') and My S CPTP(A,y) the respective measurement 
maps. Then, Massen and Uffink’s entropic UR [112] states that, for any initial state Pa G J^o(A), 
we have 


^a(^)Mx{p)+^i 3 (i')My(p) >-logc, where c = max\{(j)y^y)\^ (7.13) 

is the overlap of the two ONBs and the parameters of the conditional Renyi entropy, Ci,l5 G 
,oo), satisfy ^ ^ = 2. In the following we generalize this relation to conditional entropies and 

quantum side information. 


Tripartite Uncertainty Relation 

First, note that an observer with quantum side information that is maximally entangled with A can 
predict the outcomes of both measurements perfectly (see, for instance, the discussion in [20]). 
This can be remedied by considering two different observers — in which case the monogamy of 
entanglement comes to our rescue. We find that the most natural generalization of the Maassen- 
Uffink relation is stated for a tripartite quantum system ABC where A is the system being measured 
and B and C are two systems containing side information [37, 122]. 
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Theorem 7.6. Let Pabc G S^{ABC) and a,j3 G with ^ ^ = 2. Then, 

+hI{Y\C)ja,(p) > -logc, (7.14) 

with c defined in (7.13). 


Proof. We prove this statement for a pure state Pabc and the general statement then follows by 
the data-processing inequality. By the duality relation in Proposition 5.3, it suffices to show that 

Hl,{X\B)y^^(p)>Hl{Y\Y'B)u^^p)-\ogc, (7.15) 

where CPTP(A,FF') 9 liy : Lyy (^rlPAlt^yOlyXyir ® bX/lr' the map corresponding to 

the Stinespring dilation unitary of My. Let us now verify (7.15). We have 

Hl{y\y'B)uy{p)= max ^ -5„(Uy(p^s)||/ f0(7.16) 

< max -5a(pAB||lty'(fF<8cry/B)) (7.17) 

ay,ge.yo{Y'B) \ M r 

< max —Da(Mx(pAB)||3Vtx(lly(/y ^ffy/g))). (7.18) 

Oyigey’oiY'B) 

The first inequality follows by the data-processing inequality pinching the states so that they are 
block-diagonal with regards to the image of liy and its complement. We can then disregard the 
block outside the image since XIy{Pab) has no weight there using the mean Property (VI). The 
second inequality is due to data-processing with M^. Now, note that for every (Jy'B, we have 

Mx(By (/y 0 ffy'^)) = ( | A){By ) (g) (y| j,, ffy/^ |y)j„ (7.19) 

= EI (‘^-* 1 ^ 7 ) 1^ \^){Ax®{y\Y'’^Y'B\y)Y' (7-20) 

< \x){x\x 0 (y|y/ (Jy'b b)r = ® ctb . (7.21) 

Substituting this into (7.18) yields the desired inequality. 


Bipartite Uncertainty Relation 

Based on the tripartite UR in Theorem 7.6, we can now explore bipartite URs with only one 
side information system. To establish such an UR, we start from (7.15) and use the chain rule in 
Theorem 5.1 to find 

H^a{X\B)M^^p)>H;{YY'\B)uy(p)-Hl{Y'\B)uyip)-logc^ 
where we chose j3, 7 >5 such that 


(7.22) 
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7^1 = ^ + ^ (a-l)(j3-l)(7-l)<0. (7.23) 

Then, using the fact that the marginals on YB and Y'B of the state Uy{Pab) G S^dYY'B) are 
equivalent and that the conditional entropies are invariant under local isometries, we conclude 
that 


+hI{Y\B)^^(p) > //;(A|B)p +logi . (7.24) 

Interesting limiting cases include a = 2, j3 ^ i, and 7 —>^ oo as well as 1 . 

Clearly, variations of this relation can be shown using different conditional entropies or chain 
rules. However, all bipartite URs share the property that on the right-hand side of the inequality 
there appears a conditional entropy of the state Pab prior to measurement. This quantity can be 
negative in the presence of entanglement, and in particular for the case of a maximally entangled 
state the term on the right-hand side becomes negative or zero and the bound thus trivial. 


7.3 Randomness Extraction 

One of the main applications of the smooth entropy framework is in cryptography, in particular in 
randomness extraction, the art of extracting uniform randomness from a biased source. Here the 
smooth min-entropy of a classical system characterizes the amount of uniformly random key that 
can be extracted such that it is independent of the side information. More precisely, we consider a 
source that outputs a classical system Z about which there exists side information E —potentially 
quantum — and ask how much uniform randomness, S, can be extracted from Z such that it is 
independent of the side information E. 


7.3.1 Uniform and Independent Randomness 

The quality of the extracted randomness is measured using the trace distance to a perfect secret 
key, which is uniform on S and product with E. Namely, we consider the distance 


A{S\E)p ■.= A{psE,Tts®PE)^ (7.25) 

where 71$ is the maximally mixed state. Due to the operational interpretation of the trace distance 
as a distinguishing advantage, a small A implies that the extracted random variable cannot be 
distinguished from a uniform and independent random variable with probability more than 5 ( 1 - 1 - 
A). This viewpoint is at the root of universally composable security frameworks (see, e.g., [33, 
167]), which ensure that a secret key satisfying the above property can safely be employed in any 
(composable secure) protocol requiring a secret key. 

A probabilistic protocol T extracting a key S from Z using a random seed F is comprised of 
the following: 

• A set = {/} of functions f :Z ^ S which are in one-to-one correspondence with the stan¬ 
dard basis elements |/) of F. 
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• A probability mass function T S S^o{F). 

The protocol then applies a function / € at random (according to the value in F) on the 
input Z to create the key S. Clearly, this process can be summarized by a classical channel T G 
CPTP(Z,SF). More explicitly, we start with a classical-quantum state pzE of the form 

Pze=J^Iz)(zIz<S>Pe(z)=J^P(z)Iz)(zIz<S>Pe(z), Pe(z) € X,(E) ■ (7.26) 

z z 

The protocol will transform this state into psEF = (3^z->-sf <S>7e)(Pze), where 

Ps£f=E^(/)Ps£(/)®I/)(/If, and (7.27) 

/ 

P5£(/) 

is the state produced when / is applied to the Z system of pzE- 
For such protocols, we then require that the average distance 

^T(f)A(SlE)pf=A(SlEF)p (7.29) 

f 

is small, or, equivalently, we require that the extracted randomness is independent of the seed 
F as well as E. This is called the strong extractor regime in classical cryptography, and clearly 
independence of F is crucial as otherwise the extractor could simply output the seed. A random¬ 
ness extractor of the above form that satisfies the security criterion A{S\EF)p < £ is said to be 
e-secret. 

Finally, the maximal number of bits of uniform and independent randomness that can be ex¬ 
tracted from a state pzE is then defined as log 2 ^®(Z|£’)p, where 

f®(Z|£')p := max{£ G N : BTs.t. f ATis £-secret} . (7.30) 

The classical Leftover Hash Lemma [91,92,114] states that the amount of extractable random¬ 
ness is at least the min-entropy of Z given £. In fact, since hashing is an entirely classical process, 
one might expect that the physical nature of the side information is irrelevant and that a purely 
classical treatment is sufficient. This is, however, not true in general. For example, the output of 
certain extractor functions may be partially known if side information about their input is stored 
in a quantum device of a certain size, while the same output is almost uniform conditioned on any 
side information stored in a classical system of the same size. (See [65] for a concrete example 
and [100] for a more general discussion of this topic.) 


7.3.2 Direct Bound: Leftover Hash Lemma 

A particular class of protocols that can be used to extract uniform randomness is based on two- 
universal hashing [36]. A two-universal family of hash functions, in the language of the previous 
section, satisfies 
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[F(z) = F(z')] = E^(/)%)./(^') = ^ Vz ^z'. 


F^-c 


f 


(7.31) 


Using two-universal hashing, Renner [139] established the following bound. 

Proposition 7.1. Let p G y{ZE). For every £ G N, there exists a randomness extractor as 
prescribed above such that 


A{S\EF)p <exp( ^(log£ 

^min imp] 


(7.32) 


We provide a proof that simplifies the original argument. We also note that instead of //min one 
can write to get a tighter bound in (7.32). 

Proof. We set dg = 1. Using the notation of the previous section, we have 


A{S\EF)p =^t(/) ||ps£(/)-7r5(g)p£||j. 
/ 


(7.33) 


We note that Psif) = Pe does not depend on /. Then, by Holder’s inequality, for any CJ G 5^o{E) 
such that op 7^ p^ for all /, we have 


|pS£(/)-%®P£||i = 
< 


1 _ 1 

^E ^ {PSE (/) -Ks^Pe) 




(7.34) 

(7.35) 

(7.36) 


_ 1 

<yE^{PSE{f)-7ls®PE) 

= Tr ' [psE {f)-T^S®PE)^"^- 

Hence, Jensen’s inequality applied to the square root function yields 

(21 (^l/iT')p)" < t/5 E <f) Tr ‘ {PsEif) -Tts^PE) (psEif) " % ® Pit)) (7.37) 

/ 

= E^(/)Tr(o-£‘p5£(/)p5£(/)) - ^Tr(^(7£ VI) , 

where we used that % = Next, by the definition of PsEif) in (7.28), we find 
ET(/)Tr((T£-V5it(/)p5it(/)) 

= E '^(/)%),/(z')Tr((7£ V£(2)P£(2')) 

f,z,z' 

= E T-Tr((7B V£(2)P£(2')) +ETr((^£ V£(2)P£(2) 

= {^e'Pe) + (l - ^) Tr Vlit 


(7.38) 

(7.39) 

(7.40) 

(7.41) 

(7.42) 
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Substituting this into (7.38), we observe that two terms cancel, and maximizing over Oe we find 

A{S\EF)p < ^dsexp{-H^{Z\E)p), (7.43) 

where we used the definition of H 2 {Z\E)p and optimized over all Ob- The desired bound then 
follows since (Z|£’)p > H„^ia{Z\E)p according to Corollary 5.3. □ 

From the definition of £^{Z\E)p we can then directly deduce that 

log£"(Z|£)p > Hi{Z\E)p - 21og i > H^^{Z\E)p - 21og ^ . (7.44) 

This can then be generalized using the smoothing technique as follows: 

Corollary 7.1. The same statement as in Proposition 7.1 holds with 

Zl(5|£F)p <exp(i(log£-//C„(Z|£)p)) +2e. (7.45) 

Proof. Let pzE be a state maximizing HPJ^Z\E)p = //min(Z|£’)p. Then, Proposition 7.1 yields 

Zl(5|£F)p <exp(i(log^-//C„(Z|£)p)) . (7.46) 

Moreover, employing the triangle inequality twice, we find that A {S\EF)p < A {S\EF)p +2e. □ 

This result can also be written in the following form: 

log£®(Z|£')p >(Z|£')p-21og —, where e = 2ei+e2. (7.47) 

£2 

Note that the protocol families discussed above work on any state pzE with sufficiently high 
min-entropy, i.e. they do not take into account other properties of the state. Next, we will see that 
these protocols are essentially optimal. 


7.3.3 Converse Bound 

We prove a converse bound by contradiction. Assume for the sake of the argument that we have 
an £-good protocol that extracts log^ > HP^{Z\E)p bits of randomness, where e' = s/le — ef. 
Then, due to Proposition 6.4 we know that applying a function on Z cannot increase the smooth 
min-entropy, thus 


V / G F : H<,{S\E)^f < HijZ\E)p < logf. (7.48) 

This in turn implies that ^t(/) A(5|£’)p/ > e as the following argument shows. The above in¬ 
equality as well as the definition of the smooth min-entropy implies that all states p with 

7"(P5£,p 4) <e' or 4(P5£,p4)<e 


(7.49) 
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necessarily satisfy Hjnin{S\E)p < logf. (The latter statement follows from the Fuchs-van de Graaf 
inequalities in Lemma 3.5.) In particular, these close states can thus not be of the form % 0 Pe, 
because such states have min-entropy logf. Thus, A{S\E)pf > e. 

Since this contradicts our initial assumption that the protocol is e-good, we have established 
the following converse bound; 


loge%Z\E)p<H<,iZ\E)p. (7.50) 

Collecting {1 Al) and (7.50), we arTive at the following theorem. 


Theorem 7.7. Let pzE & S^,{ZE) be classical on Z and let e € (0,1). Then, 

H^JZlE)p-21og^<log£^(ZlE)p<H^i(ZlE)p, (7.51) 

for any 5 € (0,e), £' = and e" = \/2e — e^. 


We have thus established that the extractable uniform and independent randomness is char¬ 
acterized by the smooth min-entropy, in the above sense. One could now analyze this bound 
further by choosing an n-fold iid product state and then apply the AEP to find the asymptotics 
of ilogf®(Z"|£«) p®n for large n. More precisely, using (6.107) we can verify that the upper and 
lower bounds on this quantity agree in the first order but disagree in the second order. In particu¬ 
lar, the dependence on e is qualitatively different in the upper and lower bound. Thus, one could 
certainly argue that the bounds in Theorem 7.7 are not as tight as they should be in the asymptotic 
limit. We omit a more detailed discussion of this here (see [157] instead) since most applications 
consider the task of randomness extraction only in the one-shot setting where the resource state 
is unstructured. 


7.4 Background and Further Reading 

The quantum Chernoff bound has been established by Nussbaum and Szkola [127] (converse) and 
Audenaert et al. [10] (achievability). Quantum Stein’s Lemma was shown by Hiai and Petz [86] 
(achievability and weak converse) and Ogawa and Nagaoka [128] (strong converse). Its second 
order refinement was proven independently by Li [105] and in [157]. The quantum Hoeffding 
bound was established by Hayashi [76] (achievability) and Nagaoka [123] (converse). Audenaert 
et al. [12] provide a good review of these results. The optimal strong converse exponent was 
recently established by Mosonyi and Ogawa [119]. 

The limiting cases a — ^ = I and a —oo, j3 —j of the tripartite Maassen-Uffink entropic 
UR in Theorem 7.6 were first shown by Berta et al. [20] and in [159], respectively. The former 
was first conjectured and proven in a special case by Renes and Boileau [137] and extended to 
infinite-dimensional systems [56,61]. Here we follow a simplified proof strategy due to Coles 
et al. [37]. The exact result presented here can be found in [122]. Tripartite URs in the spirit of 
Section 7.2 can also be shown for smooth min- and max-entropies, both for the case of discrete 
observables in [159], and for the case of continuous observables (e.g. position and momentum) 
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by Furrer et al. [61]. These entropic URs lie at the core of security proofs for quantum key 
distribution [62,158]. 

There exist other protocol families that extract the min-entropy against quantum adversaries, 
for example based on almost two-universal hashing [160] or Trevisan’s extractors [46]. These 
families are considered mainly because they need a smaller seed or can be implemented more 
efficiently than two-universal hashing. 



Appendix A 

Some Fundamental Results in Matrix Analysis 


One of the main technical ingredients of our derivations are the properties of operator monotone 
and concave functions. While a comprehensive discussion of their properties is outside the scope 
of this book, we will provide an elementary proof of the Lieb-Ando Theorem in (2.50) and the 
joint convexity of relative entropy, which lie at the heart of our derivations. 


Preparatory Lemmas 


We follow the proof strategy of Ando [4], although highly specialized to the problem at hand. 
We restrict our attention to finite-dimensional positive definite matrices here and start with the 
following well-known result; 

Lemma A.l. Let A^B be positive definite, and X linear. We have 



A>XB-'^X\ 


(A.l) 


Proof. Since the matrix 
iff 


(rr') 


is invertible, we find that 



> 0 holds iff and only 


fl (A X\f I 0\ _ 0\ 

1^0 I )\x^b)\-b-'^x'^i)~\ 0 By ’ 


(A.2) 


from which the assertion follows. 


□ 


From this we can then derive two elementary results: 

Lemma A.2. The map {A,B) BA^^B is jointly convex and the map {A,B) (A^' is 

jointly concave. 

The latter expression is proportional to the matrix harmonic mean A!B = 2(A^' -|-B^^)^*, and its 
joint concavity was first shown in [3]. 

Proof. Let Ai,A 2 ,Bi,B 2 be positive definite. Then, by Lemma A.l, for any X G [0,1], we have 
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0< A 


'At \ 

Bi 


+ (1-A) 


'A2 B2 \ 
B2 B2A^^B2) 


/AAi + (1-A)A2 ABi + (1-A)B2 

l^ABi + (1 -A)B2 ABiAj^'Bi + (1 -A)B2A2'B2 


and, invoking Lemma A. 1 once again, we conclude that 

ABiAf'B i + (1 - A)B2A2'^2 

> (ABi + (1 - A)B2) (AAi + (1 -X)A2Y^ (ABi + (1 - A)B2), 


(A.3) 

(A.4) 


(A.5) 


establishing joint convexity of the first map. 

To investigate the second map, we use a Woodbury matrix identity, 

{A-^=B-B{A+B)-^B, (A.6) 

which can be verihed by multiplying both sides with A^* +B^^ from either side and simplifying 
the resulting expression. To conclude the proof, we note that B(A is jointly convex due 

to the first statement and the fact that A + B is linear in A and B. □ 

As a simple corollary of this we find that A i-A A^^ and B i—> B^ are convex. 


Proof of Lieb-Ando Theorem 

Let us now state Lieb and Ando’s results [4, 106]. 


Theorem A.l. The map {A,B) i— > A“ (8)B^ “ on positive definite operators is jointly con¬ 
cave for a € (0,1) and jointly convex for a S (—1,0) U (1,2). 


Proof. Using contour integration one can verify that /^(l +A)^*A“^*dA = ;rsin(a;r)^* for a S 
(0,1). By the change of variable X ^ ji = tX, we then find the following integral representation 
for all a G (0,1) and f > 0; 


^ sin(a;r) 


t 


JO ~\-t 


dp. 


Let us now first consider the case a G (0,1). Using (A. 7), we write 


(A.7) 


A“(g)B'^“ = (A(g)B^')“ ‘-A®/ (A.8) 

= sin(^;r) j d^u. (A.9) 

Thus, it suffices to show joint concavity for every term in the integrand, i.e. for the map 

(A,B) (^/(g)/+A(g)B^^)^'A(g)/= (^A^'®/ + /®B^')^' (A.IO) 


and all p >0. This is a direct consequence of the second statement of Lemma A.2. 
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Next, we consider the case a € (1,2). We again write this as 

-A®/ (A.ll) 

= sin((a^ l)^) y (A(g)B^^)(/r/(g)/+A(g)B^')^'A(g)/. (A.12) 

The integrand here simplihes to 

{A®B-^){pLl®I^A®B-'^y\®I = A®l{pLl®B+A®iy'^A®I. (A. 13) 

However, the hrst statement of Lemma A.2 asserts that the latter expression is jointly convex in 
the arguments A (g)/ and /i/gZ?+Ag/. And, moreover, since they are linear in A and B, it follows 
that the integrand is jointly convex for all /t > 0. The remaining case follows by symmetry. □ 

The joint convexity and concavity of the trace functional in (2.50) now follows by the argument 
presented in Section 2.5, which allows to write 

Tr{A^KB^-^K^) = ® (B^). (A. 14) 

This thus gives us a compact proof of Lieb’s Concavity Theorem and Ando’s Convexity Theorem. 
Finally, we can also relax the condition that A and B are positive definite by choosing A' = A + e/ 
and B' = B +el and taking the limit £ —> 0. Choosing K = I, we find that this limit exists as long 
as we require that B ^ A if a > 1. 


Joint Convexity of Relative Entropy 

As a bonus we will use the above techniques to show that the relative entropy is jointly convex, 
thereby providing a compact proof of strong sub-additivity. 

Theorem A.2. The map {A,B) i-A Alog(A) ®/ —A glog(B) is jointly convex. 


Proof. It suffices to prove this statement for the natural logarithm. We will use the representation 

1 


in(f) = r — 
Jo lip 


B T 1 ji Pt 
Using this integral representation, we then write 


dp 


L 


' Ag/ 
0 B + 1 
' Ag/ 
p P 1 


(A. 15) 


•Ag/ 

(A. 16) 

(B/g/+AgB^')^’ -Ag/dB 

(A. 17) 

(BA^'g/ + /gB-')^‘dB. 

(A. 18) 


Invoking Lemma A.2, we can check that the integrand is jointly convex for all p >0. 


As an immediate corollary, we find that 








120 


A Some Fundamental Results in Matrix Analysis 


D{p\\a) = Tr(plogp — plogcj) = plogp (g)/ —p (g)logff^ |'f') (A. 19) 

is jointly convex in p and a. This in turn implies the data-processing inequality for the rela¬ 
tive entropy using Uhlmann’s trick as discussed in Proposition 4.2. In particular, we find strong 
subadditivity if we apply the data-processing inequality for the partial trace: 

H{ABC)p -H{BC)p = D{pabc\\Ia<^Pbc) (A.20) 

<D{pab\\Ia(S)Pb)=H{AB)p-H{B)p. (A.21) 
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